Wiki Contributions


Order matters more at smaller scales - if you're training a small model on a lot of data and you sample in a sufficiently nonrandom manner, you should expect catastrophic forgetting to kick in eventually, especially if you use weight decay.

I think I can just tell a lot of stuff wrt human values! How do you think children infer them? I think in order for human values to not be viable to point to extensionally (ie by looking at a bunch of examples) you have to make the case that they're much more built-in to the human brain than seems appropriate for a species that can produce both Jains and (Genghis Khan era) Mongols.


I'd also note that "incentivize" is probably giving a lot of the game away here - my guess is you can just pull them out much more directly by gathering a large dataset of human preferences and predicting judgements.

Why do you expect it to be hard to specify given a model that knows the information you're looking for? In general the core lesson of unsupervised learning is that often the best way to get pointers to something you have a limited specification for is to learn some other task that necessarily includes it then specialize to that subtask. Why should values be any different? Broadly, why should values be harder to get good pointers to than much more complicated real-world tasks?

yeah I basically think you need to construct the semantic space for this to work, and haven't seen much work on that front from language modeling researchers.

drives me kinda nuts because I don't think it would actually be that hard to do, and the benefits might be pretty substantial.

Can you give an example of a theoretical argument of the sort you'd find convincing? Can be about any X caring about any Y.

On the impossible-to-you world: This doesn’t seem so weird or impossible to me? And I think I can tell a pretty easy cultural story slash write an alternative universe novel where we honor those who maximize genetic fitness and all that, and have for a long time—and that this could help explain why civilization and our intelligence developed so damn slowly and all that. Although to truly make the full evidential point that world then has to be weirder still where humans are much more reluctant to mode shift in various ways. It’s also possible this points to you having already accepted from other places the evidence I think evolution introduces, so you’re confused why people keep citing it as evidence.

The ability to write fiction in a world does not demonstrate its plausibility. Beware generalizing from fictional fictional evidence!

The claim that such a world is impossible is a claim that, were you to try to write a fictional version of it, you would run into major holes in the world that you would have to either ignore or paper over with further unrealistic assumptions.

In case it is not clear: My expectation is that sufficiently large capabilities/intelligence/affordances advances inherently break our desired alignment properties under all known techniques.

Nearly every piece of empirical evidence I've seen contradicts this - more capable systems are generally easier to work with in almost every way, and the techniques that worked on less capable versions straightforwardly apply and in fact usually work better than on less intelligent systems.

When I explain my counterargument to pattern 1 to people in person, they will very often try to "rescue" evolution as a worthwhile analogy for thinking about AI development. E.g., they'll change the analogy so it's the programmers who are in a role comparable to evolution, rather than SGD.

In general one should not try to rescue intuitions, and the frequency of doing this is a sign of serious cognitive distortions. You should only try to rescue intutions when they have a clear and validated predictive or pragmatic track record.

The reason for this is very simple - most intuitions or predictions one could make are wrong, and you need a lot of positive evidence to privilege any particular hypotheses re how or what to think. In the absence of evidence, you should stop relying on an intuition, or at least hold it very lightly.

The obvious question here is to what degree do you need new techniques vs merely to train new models with the same techniques as you scale current approaches.


One of the virtues of the deep learning paradigm is that you can usually test things at small scale (where the models are not and will never be especially smart) and there's a smooth range of scaling regimes in between where things tend to generalize.


If you need fundamentally different techniques at different scales, and the large scale techniques do not work at intermediate and small scales, then you might have a problem. If you need the same techniques as at medium or small scales for large scales, then engineering continues to be tractable even as algorithmic advances obsolete old approaches.

It's more like calling a human who's as smart as you are and directly plugged into your brain and in fact reusing your world model and train of thought directly to understand the implications of your decision. That's a huge step up from calling a real human over the phone!

The reason the real human proposal doesn't work is that

  1. the humans you call will lack context on your decision
  2. they won't even be able to receive all the context
  3. they're dumber and slower than you so even if you really could write out your entire chain of thoughts and intuitions consulting them for every decision would be impractical

Note that none of these considerations apply to integrated language models!

Load More