porby

Wiki Contributions

Comments

The basic idea that lifting twice a week and doing cardio twice a week add up to a calorie expenditure that get you the vast majority of exercise benefits compared to extreme athletes holds up, especially when you take reverse causality adjustments into effect (survivorship bias on the genetic gifts of the extreme). Nothing I've encountered since has cast much doubt on this main takeaway.

I'd like to offer further reinforcement on this point:

  1. Any exercise, even trivial amounts like walking 20 minutes or doing two sets of exercise a day, can yield dramatic improvements in quality of life. Young-ish healthy-ish people don't tend to notice this because they're already above the "life isn't constantly painful and difficult" bar, but aging combined with a highly sedentary lifestyle can sneak up on some people. It can sometimes take very little to fix or avoid.
  2. Maintaining muscle mass that you've already built is much easier than building it in the first place. Some people find the thought of progressively overloading for eternity to be daunting- and if you don't want to, you don't have to! If you achieve a level of strength that satisfies your requirements, you can maintain it with relatively little effort. And if you do lose it, getting strong a second time will be easier.

In other words: when it comes to exercise, doing anything really does help! (Just don't hurt yourself.)

Something like this may be useful, but I do struggle to come up with workable versions that try to get specific about hardware details. Most options yield Goodhart problems- e.g. shift the architecture a little bit so that real world ML performance per watt/dollar is unaffected, but it falls below the threshold because "it's not one GPU, see!" or whatever else. Throwing enough requirements at it might work, but it seems weaker as a category than "used in a datacenter" given how ML works at the moment.

It could be that we have to bite the bullet and try for this kind of extra restriction anyway if ML architectures shift in such a way that internet-distributed ML becomes competitive, but I'm wary of pushing for it before that point because the restrictions would be far more visible to consumers.

In summary, maybeshrugidunno!

A hardware protection mechanism that needs to confirm permission to run by periodically dialing home would, even if restricted to large GPU installations, brick any large scientific computing system or NN deployment that needs to be air-gapped (e.g. because it deals with sensitive personal data, or particularly sensitive commercial secrets, or with classified data). Such regulation also provides whoever controls the green light a kill switch against any large GPU application that runs critical infrastructure. Both points would severely damage national security interests.

Yup! Probably don't rely on a completely automated system that only works over the internet for those use cases. There are fairly simple (for bureaucratic definitions of simple) workarounds. The driver doesn't actually need to send a message anywhere, it just needs a token. Airgapped systems can still be given those small cryptographic tokens in a reasonably secure way (if it is possible to use the system in secure way at all), and for systems where this kind of feature is simply not an option, it's probably worth having a separate regulatory path. I bet NVIDIA would be happy to set up some additional market segmentation at the right price.

The unstated assumption was that the green light would be controlled by US regulatory entities for hardware sold to US entities. Other countries could have their own agencies, and there would need to be international agreements to stop "jailbroken" hardware from being the default, but I'm primarily concerned about companies under the influence of the US government and its allies anyway (for now, at least).

techniques similar in spirit have been seriously proposed to regulate use of cryptography (for instance, via adoption of the Clipper chip), but I think it's fair to say they have not been very successful.

I think there's a meaningful difference between attempts to regulate cryptography and regulating large machine learning deployments; consumers will never interact with the regulatory infrastructure, and the negative externalities are extremely small compared to compromised or banned cryptography.

I'm generally on board with attempts to have more precise options for referring to these concepts, and in this context I agree that policy as a term is more appropriate and that gradients from RL training don't magically include more agent juice.

That said, I do think there is an important distinction between the tendencies of systems built with RL versus supervised learning that arises from reward sparsity.

In traditional RL, individual policy outputs aren't judged in as much detail as in supervised learning. Even when comparing against RL with reward shaping, it is still likely going to be far less densely defined and constrained than, say, per-output predictive loss.

Since the target is smaller and more distant, traditional RL gives the optimizer more room to roam. I think it's correct to say that most RL implementations will have a lot of reactive bits and pieces that are selected to form the final policy, but because learning instrumental behavior is effectively required for traditional RL to get anywhere at all, it's more likely (than in predictive loss) that nonmyopic internal goal-like representations will be learned as a part of those instrumental behaviors.

Training on purely predictive loss, in contrast, is both densely informative and extremely constraining. Goals are less obviously convergently useful, and any internal goal representations that are learned need to fit within the bounds enforced by the predictive loss and should tend to be more local in nature as a result. Learned values that overstep their narrowly-defined usefulness get directly slapped by other predictive samples.

I think the greater freedom RL training tends to have, and the greater tendency to learn more broadly applicable internal goals to drive the required instrumental behavior, do make RL-trained systems feel more "agentic" even if it is not absolutely fundamental to the training process, nor even really related to the model's coherence.

Thanks! Just updated the edited version link to the latest version too. Unfortunately, the main new content is redacted, so it'll be pretty familiar.

This is a project I'd like to see succeed!

For what it's worth, I talked to Alexandra around EAG London a couple of times (I'm Ross, hi again!) and I think she has a good handle on important coordination problems. I encourage people to apply.

Bit of a welp:

NVIDIA Q1 FY24 filings just came out. In the May 9th edit, I wrote:

I suspect that NVIDIA’s data center revenue will recover in the next year or so. 

In reality, it had already recovered and was in the process of setting a new record.

If the number of tokens in the input sentence is the input size of its time complexity, which I'm sure you can agree is the obvious choice

Yeah, you're not alone in thinking that- I think several people have been tripped up by that in the post. Without making it clear, my analysis just assumed that the context window was bounded by some constant, so scaling with respect to token counts went out the window. So:

Correct me if I'm wrong but it seems like you are saying that for each token generated, the transformer is only allowed to process for a constant amount of time ... Additionally assuming it is only generating one token. 

Yup.

This is one of the things I'm clarifying for the openphil submission version of the post, along with a section trying to better tie together why it matters. (More than one person has come away thinking something like "but every algorithm bottoms out at individual constant time steps, this isn't interesting, CoT etc.")

Quarter-baked ideas for potential future baking:

  1. A procedure for '~shardifying'[1] an incoherent utility function into a coherent utility function by pushing preferences into conditionals. Example of an extreme case of this would be an ideal predictor (i.e. one which has successfully learned values fit to the predictive loss, not other goals, and does not exhibit internally motivated instrumental behavior) trained to perfectly predict the outputs of an incoherent agent.

    The ideal predictor model, being perfectly conditional, would share the same outputs but would retain coherence: inconsistencies in the original utility function are remapped to be conditional. Apparent preference cycles over world states are fine if the utility function isn't primarily concerned with world states. The ideal predictor is coherent by default- it doesn't need to work out any kinks to avoid stepping on its own toes.

    Upon entering a hypothetical capability-induced coherence death spiral, what does the original inconsistent agent do? Does it try to stick to object level preferences, forcing it to violate its previous preferences in some presumably minimized way?[2] Or does it punt things into conditionality to maintain behaviors implied by the original inconsistencies? Is that kind of shardification convergent?
  2. Is there a path to piggybacking on greed/noncompetitive inclinations for restricting compute access in governance? One example: NVIDIA already requires that data center customers purchase its vastly more expensive data center products. The driver licenses for the much cheaper gaming class hardware already do not permit use cases like "build a giant supercomputer for training big LLMs."

    Extending this to, say, having a dead man's switch built into the driver if the GPU installation hasn't received an appropriate signal recently (implying that the relevant regulatory entity has not been able to continue its audits of the installation and its use), the cluster simply dies.

    Modified drivers could bypass some of the restrictions, but some hardware involvement would make it more difficult. NVIDIA may already be doing this kind of hardware-level signing to ensure that only approved drivers can be used (I haven't checked). It's still possible in principle to bypass- the hardware and software are both in the hands of the enemy- but it would be annoying.

    Even if they don't currently do that sort of check, it would be relatively simple to add some form of it with a bit of lead time.

    By creating more regulatory hurdles that NVIDIA (or other future dominant ML hardware providers) can swallow without stumbling too badly, they get a bit of extra moat against up-and-comers. It'd be in their interest to get the government to add those regulations, and then they could extract a bit more profit from hyperscalers.
  1. ^

    I'm using the word "shard" here to just mean "a blob of conditionally activated preferences." It's probably importing some other nuances that might be confusing because I haven't read enough of shard theory things to catch where it doesn't work.

  2. ^

    This idea popped into my head during a conversation with someone working on how inconsistent utilities might be pushed towards coherence. It was at the Newspeak House the evening of the day after EAG London 2023. Unfortunately, I promptly forgot their name! (If you see this, hi, nice talking to you, and sorry!)

The openphil contest is approaching, so I'm working on an edited version. Keeping this original version as-is seems like a good idea- both as a historical record and because there's such a nice voiceover!

I've posted the current version over on manifund with a pdf version. If you aren't familiar with manifund, I'd recommend poking around. Impact certificates are neat, and I'd like them to become more of a thing!

The main changes are: 

  1. Added a short section trying to tie together why the complexity argument actually matters.
  2. Updated a few spots with notes for things that have happened between October 2022 and May 2023, including a section dedicated to NVIDIA quarterly reports.
  3. A very large new section named The Redacted Section dedicated to [REDACTED].
  4. Removed pretty much all risk-related arguments in favor of focusing on timelines to save some words. It is still very long, ack.
  5. A bunch of small clarity/quality edits, including the removal of 704 unnecessary instances of the word "actually."

Overall, I'm pretty happy with how the post has fared in the last several months. The largest miss is probably the revenue forecasts- I didn't anticipate massive semiconductor export restrictions. Given the complexity, I'm not sure how to interpret this in terms of AI timelines yet. It's notable that hyperscalers are a large and rapidly growing customer base for NVIDIA that already managed to mitigate temporary losses, and I doubt the recently strengthened race dynamics are going to change that (until those companies decide to push alternatives for ML hardware).

My timelines haven't noticeably changed. GPT-4 is around the median of my previous vaguely-gut-defined capability distribution. I anticipate the next generation of applications that build some infrastructure around GPT-4 level systems (like the next version of github copilot) will surprise a few more people, just because the full capability of GPT-4 isn't immediately apparent in a pure dialogue setting.

My P(doom) has actually decreased since I wrote the post: I'm down to around 30-35% ish. I had only recently gotten into serious technical safety research when I wrote the post, so some volatility isn't surprising, but I'm glad it went the direction it did. That reduction is mostly related to some potential implications of predictor/simulator research efforts (not necessarily complete solutions, but rather certain things being easier than expected) and positive news about the nature of the problem and interpretability. (Worth noting that number expects Effort and I do not expect default flailing to work out, and that my estimate should still be treated as relatively volatile.) 

Load More