RobertM

LessWrong dev & admin as of July 5th, 2022.

Comments

Sorted by

(Separately, even accepting for the sake of argument that you notice most work done and have a negative reaction to it, that is not very strong counterevidence to the original claim.)

If the only thing you see about Aella is that she had work done on her lips, then I think that sufficiently well demonstrates the point that you don't notice most "high quality" plastic surgery.

RobertMΩ453

They imagine writing small and carefully locked-down infrastructure and allowing the AIs to interact with it.

That's surprising and concerning.  As you say, if these companies expect their AIs to do end-to-end engineering and R&D tasks internally, it seems difficult to imagine how they could do that without having employee-level privileges.  Any place where they don't is a place where humans turn into a bottleneck.  I can imagine a few possible objections to this:

  1. They don't expect those bottlenecks to impose that much of a penalty.
    1. I'm not sure how this could be true unless you think that AI systems will be pretty weak, which is sort of fighting the premise.  What are we worried about, again?
  2. They expect the bottlenecks to impose a large penalty and plan on biting that bullet when the time comes.
    1. I currently roll to disbelieve based on both the publicly-held positions of the relevant organizations and also their historical track records.  Incentives here seem quite bad.
  3. They plan on trying to thread the needle by employing some control schemes where (for example) different "agents" have different permissions.  i.e. a "code writing" agent has read permissions for (some parts of) the codebase, the ability to write, deploy, and test changes to that code in a sandboxed dev environment, and the ability to open a pull request with those changes.  Another set of agents have permissions to review pull requests, and then request changes, approve/merge them, or flag the PR as suspicious.  Yet another set of agents act as gatekeepers to sensitive data that might be needed for some experiments but only with good justification.
    1. This still suffers from the incentive gradient pushing quite hard to just build end-to-end agents.  Not only will it probably work better, but it'll be straight up cheaper and easier!

Like, to be clear, I would definitely prefer a world where these organizations wrote "small and carefully locked-down infrastructure" as the limited surface their AIs were allowed to interact with; I just don't expect that to actually happen in practice.

Answer by RobertM210

This comment describes how the images for the "Best of LessWrong" (review winners) were generated.  (The exact workflow has varied a lot over time, as image models have changed quite a lot, and LLMs didn't always exist, and we've built more tooling for ourselves, etc.)

The prompt usually asks for an aquarelle painting, often in the style of Thomas Schaller.  (Many other details, but I'm not the one usually doing artwork, so not the best positioned to point to common threads.)  And then there's a pretty huge amount of iteration and sometimes post-processing/tweaking.

RobertM1010

Almost every comment rate limit stricter than "once per hour" is in fact conditional in some way on the user's karma, and above 500 karma you can't even be (automatically) restricted to less than one comment per day:

https://github.com/ForumMagnum/ForumMagnum/blob/master/packages/lesswrong/lib/rateLimits/constants.ts#L108

  // 3 comments per day rate limits
    {
      ...timeframe('3 Comments per 1 days'),
      appliesToOwnPosts: false,
      rateLimitType: "newUserDefault",
      isActive: user => (user.karma < 5),
      rateLimitMessage: `Users with less than 5 karma can write up to 3 comments a day.<br/>${lwDefaultMessage}`,
    }, 
    {
      ...timeframe('3 Comments per 1 days'), // semi-established users can make up to 20 posts/comments without getting upvoted, before hitting a 3/day comment rate limit
      appliesToOwnPosts: false,
      isActive: (user, features) => (
        user.karma < 2000 && 
        features.last20Karma < 1
      ),  // requires 1 weak upvote from a 1000+ karma user, or two new user upvotes, but at 2000+ karma I trust you more to go on long conversations
      rateLimitMessage: `You've recently posted a lot without getting upvoted. Users are limited to 3 comments/day unless their last ${RECENT_CONTENT_COUNT} posts/comments have at least 2+ net-karma.<br/>${lwDefaultMessage}`,
    }, 
  // 1 comment per day rate limits
    {
      ...timeframe('1 Comments per 1 days'),
      appliesToOwnPosts: false,
      isActive: user => (user.karma < -2),
      rateLimitMessage: `Users with less than -2 karma can write up to 1 comment per day.<br/>${lwDefaultMessage}`
    }, 
    {
      ...timeframe('1 Comments per 1 days'),

      appliesToOwnPosts: false,
      isActive: (user, features) => (
        features.last20Karma < -5 && 
        features.downvoterCount >= (user.karma < 2000 ? 4 : 7)
      ), // at 2000+ karma, I think your downvotes are more likely to be from people who disagree with you, rather than from people who think you're a troll
      rateLimitMessage: `Users with less than -5 karma on recent posts/comments can write up to 1 comment per day.<br/>${lwDefaultMessage}`
    }, 
  // 1 comment per 3 days rate limits
    {
      ...timeframe('1 Comments per 3 days'),
      appliesToOwnPosts: false,
      isActive: (user, features) => (
        user.karma < 500 &&
        features.last20Karma < -15 && 
        features.downvoterCount >= 5
      ),
      rateLimitMessage: `Users with less than -15 karma on recent posts/comments can write up to 1 comment every 3 days. ${lwDefaultMessage}`
    }, 
  // 1 comment per week rate limits
    {
      ...timeframe('1 Comments per 1 weeks'),
      appliesToOwnPosts: false,
      isActive: (user, features) => (
        user.karma < 0 && 
        features.last20Karma < -1 && 
        features.lastMonthDownvoterCount >= 5 &&
        features.lastMonthKarma <= -30
      ),
      // Added as a hedge against someone with positive karma coming back after some period of inactivity and immediately getting into an argument
      rateLimitMessage: `Users with -30 or less karma on recent posts/comments can write up to one comment per week. ${lwDefaultMessage}`
    },

I think you could make an argument that being rate limited to one comment per day is too strict given its conditions, but I don't particularly buy this as argument against rate limiting long-term commenters in general.

But presumably you want long-term commenters with large net-positive karma staying around and not be annoyed by the site UI by default.

A substantial design motivation behind the rate limits, beyond throttling newer users who haven't yet learned the ropes, was to reduce the incidence and blast radius of demon threads.  There might be other ways of accomplishing this, but it does require somehow discouraging or preventing users (even older, high-karma users) from contributing to them.  (I agree that it's reasonable to be annoyed by how the rate limits are currently communicated, which is a separate question from being annoyed at the rate limits existing at all.)

RobertMModerator Comment64

Hi Bharath, please read our policy on LLM writing before making future posts consisting almost entirely of LLM-written content.

In a lot of modern science, top-line research outputs often look like "intervention X caused 7% change in metric Y, p <0.03" (with some confidence intervals that intersect 0%).  This kind of relatively gear-free model can be pathological when it turns out that metric Y was actually caused by five different things, only one of which was responsive to intervention X, but in that case the effect size was very large.  (A relatively well-known example is the case of peptic ulcers, where most common treatments would often have no effect, because the ulcers were often caused by an H. pylori infection.)

On the other end of the spectrum are individual trip reports self-experiments.  These too have their pathologies[1], but they are at least capable of providing the raw contact with reality which is necessary to narrow down the search space of plausible theories and discriminate between hypotheses.

With the caveat that I'm default-skeptical of how this generalizes (which the post also notes), such basic foundational science seems deeply undersupplied at this level of rigor.  Curated.

  1. ^

    Taking psychedelic experiences at face value, for instance.

Also, I would still like an answer to my query for the specific link to the argument you want to see people engage with.

I haven't looked very hard, but sure, here's the first post that comes up when I search for "optimization user:eliezer_yudkowksky".

The notion of a "powerful optimization process" is necessary and sufficient to a discussion about an Artificial Intelligence that could harm or benefit humanity on a global scale.  If you say that an AI is mechanical and therefore "not really intelligent", and it outputs an action sequence that hacks into the Internet, constructs molecular nanotechnology and wipes the solar system clean of human(e) intelligence, you are still dead.  Conversely, an AI that only has a very weak ability steer the future into regions high in its preference ordering, will not be able to much benefit or much harm humanity.

In this paragraph we have most of the relevant section (at least w.r.t. your specific concerns, it doesn't argue for why most powerful optimization processes would eat everything by default, but that "why" is argued for at such extensive length elsewhere when talking about convergent instrumental goals that I will forgo sourcing it).

No, I don't think the overall model is unfalsifiable.  Parts of it would be falsified if we developed an ASI that was obviously capable of executing a takeover and it didn't, without us doing quite a lot of work to ensure that outcome.  (Not clear which parts, but probably something related to the difficulties of value loading & goal specification.)

Current AIs aren't trying to execute takeovers because they are weaker optimizers than humans.  (We can observe that even most humans are not especially strong optimizers by default, such that most people don't exert that much optimization power in their lives, even in a way that's cooperative with other humans.)  I think they have much less coherent preferences over future states than most humans.  If by some miracle you figure out how to create a generally superintelligent AI which itself does not have (more-coherent-than-human) preferences over future world states, whatever process it implements when you query it to solve a Very Difficult Problem will act as if it does.

EDIT: I see that several other people already made similar points re: sources of agency, etc.

RobertM143

I think you misread my claim.  I claim that whatever models they had, they did not predict that AIs at current capability levels (which are obviously not capable of executing a takeover) would try to execute takeovers.  Given that I'm making a claim about what their models didn't predict, rather than what they did predict, I'm not sure what I'm supposed to cite here; EY has written many millions of words.  One counterexample would be sufficient for me to weaken (or retract) my claim.

EDIT: and my claim was motivated as a response to paragraphs like this from the OP:

It doesn’t matter that Claude is a bleeding heart and a saint, now. That is not supposed to be relevant to the threat model. The bad ones will come later (later, always later…). And when they come, will be “like Claude” in all the ways that are alarming, while being unlike Claude in all the ways that might reassure.

Like, yes, in fact it doesn't really matter, under the original threat models.  If the original threat models said the current state of affairs was very unlikely to happen (particularly the part where, conditional on having economically useful but not superhuman AI, those AIs were not trying to take over the world), that would certainly be evidence against them!  But I would like someone to point to the place where the original threat models made that claim, since I don't think that they did.

RobertM20

This is LLM slop.  At least it tells you that upfront (and that it's long).  Did you find any interesting, novel claims in it?

Load More