Posts

Sorted by New

Wiki Contributions

Comments

I'm very confused by "effective horizon length". I have at least two questions:

1) what are the units of "effective horizon length"?

The definition "how much data the model must process ..." suggests it is in units of information, and this is the case in the supervised learning extended example.

It's then stated that effective horizon length has units subjective seconds [1].

Then  in the estimation of total training FLOP as  has units subjective seconds per sample.

2) what is the motivation for requiring a definition like this?

From doing the Fermi decomposition into , intuitively the quantity that needs to be estimated is something like "subjective seconds per sample for a TAI to use the datapoint as productively as a human". This seems quite removed from the perturbation definition, so I'd love some more motivation.

Oh, and additionally in [4 of 4], the "hardware bottlenecked" link in the responses section is broken.

-----

[1] I presume it's possible to convert between "amount of data" and "subjective seconds" by measuring the number of seconds required by the brain to process that much data. However to me this is an implicit leap of faith.

In the AlphaZero interpretability paper [1], CTRL+F "Ruy Lopez" for an example where the model's progress was much faster than human progress in quality.

 

[1] https://arxiv.org/pdf/2111.09259.pdf

What happened to the unrestricted adversarial examples challenge? The github [1] doesn't have an update since 2020, and that is only to the warmup challenge. Additionally, were there any takeaways from the contest?

 

[1] https://github.com/openphilanthropy/unrestricted-adversarial-examples

This comes from the fact that you assumed "adversarial example" had a more specific definition than it really does (from reading ML literature), right? Note that the alignment forum definition of "adversarial example" has the misclassified panda as an example.

When do applications close?

When are applicants expected to begin work?

How long would such employment last?

I am interested in this criticism, particularly in connection to misconception 1 from Holden's 'Important, actionable research questions for the most important century', which to me suggests doing less paradigmatic research (which I interpret to mean 'what 'normal science' looks like in ML research/industry' in the Structure of Scientific Revolutions sense, do say if I misinterpret 'paradigm').

I think this division would benefit from some examples however. To what extent to you agree with a quick classification of mine?

Paradigmatic alignment research
1) Interpretability of neural nets (e.g colah's vision and transformer circuits)
2) Dealing with dataset bias and generalisation in ML

Pre-paradigmatic alignment research
1) Agentic foundations and things MIRI work on
2) Proposals for alignment put forward by Paul Christiano, e.g Iterated Amplification

My concern is that while the list two problems are more fuzzy and less well-defined, they are far less direcetly if at all (in 2) actually working on the problem we actually care about.