All of jsteinhardt's Comments + Replies

Hi Alex,

Let me first acknowledge that your write-up is significantly more thorough than pretty much all content on LessWrong, and that I found the particular examples interesting. I also appreciated that you included a related work section in your write-up. The reason I commented on this post and not others is because it's one of the few ML posts on LessWrong that seemed like it might teach me something, and I wish I had made that more clear before posting critical feedback (I was thinking of the feedback as directed at Oliver / Raemon's moderation norms, ... (read more)

Thanks so much, I really appreciate this comment. I think it'll end up improving this post/the upcoming paper. 

(I might reply later to specific points)

I'll just note that I, like Dan H, find it pretty hard to engage with this post because I can't tell whether it's basically the same as the Ludwig Schmidt paper (my current assumption is that it is). The paragraph the authors added didn't really help in this regard.

I'm not sure what you mean about whether the post was "missing something important", but I do think that you should be pretty worried about LessWrong's collective epistemics that Dan H is the only one bringing this important point up, and that rather than being rewarded for doing so or engaged w... (read more)

TurnTrout20dΩ132614

I, like Dan H, find it pretty hard to engage with this post because I can't tell whether it's basically the same as the Ludwig Schmidt paper (my current assumption is that it is). The paragraph the authors added didn't really help in this regard.

The answer is: No, our work is very different from that paper. Here's the paragraph in question:

Editing Models with Task Arithmetic explored a "dual" version of our activation additions. That work took vectors between weights before and after finetuning on a new task, and then added or subtracted task-specific weig

... (read more)

Here is my take: since there's so much AI content, it's not really feasible to read all of it, so in practice I read almost none of it (and consequently visit LW less frequently).

The main issue I run into is that for most posts, on a brief skim it seems like basically a thing I have thought about before. Unlike academic papers, most LW posts do not cite previous related work nor explain how what they are talking about relates to this past work. As a result, if I start to skim a post and I think it's talking about something I've seen before, I have no easy ... (read more)

3Ruby8mo
Over the years I've thought about a "LessWrong/Alignment" journal article format the way regular papers have Abstract-Intro-Methods-Results-Discussion. Something like that, but tailored to our needs, maybe also bringing in OpenPhil-style reasoning transparency (but doing a better job of communicating models). Such a format could possibly mandate what you're wanting here. I think it's tricky. You have to believe any such format actually makes posts better rather than constraining them, and it's worth the effort of writers to confirm. It is something I'd like to experiment with though.

and consequently visit LW less frequently

Tangentially, "visiting LW less frequently" is not necessarily a bad thing. We are not in the business of selling ads; we do not need to maximize the time users spend here. Perhaps it would be better if people spent less time online (including on LW) and more time doing whatever meaningful things they might do otherwise.

But I agree that even assuming this, "the front page is full of things I do not care about" is a bad way to achieve it.

tools for citation to the existing corpus of lesswrong posts and to off-site scientific papers would be amazing; eg, rolling search for related academic papers as you type your comment via the semanticscholar api, combined with search over lesswrong for all proper nouns in your comment. or something. I have a lot of stuff I want to say that I expect and intend is mostly reference to citations, but formatting the citations for use on lesswrong is a chore, and I suspect that most folks here don't skim as many papers as I do. (that said, folks like yourself c... (read more)

I think this might be an overstatement. It's true that NSF tends not to fund developers, but in ML the NSF is only one of many funders (lots of faculty have grants from industry partnerships, for instance).

5Adam Jermyn10mo
Ah this is a good point! I’m thinking more of physics, which has much more centralized funding provided by a few actors (and where I see tons of low-hanging fruit if only some full-time SWE’s could be hired). In other fields YMMV.

Thanks for writing this!

Regarding how surprise on current forecasts should factor into AI timelines, two takes I have:

 * Given that all the forecasts seem to be wrong in the "things happened faster than we expected" direction, we should probably expect HLAI to happen faster than expected as well.

 * It also seems like we should retreat more to outside views about general rates of technological progress, rather than forming a specific inside view (since the inside view seems to mostly end up being wrong).

I think a pure outside view would give a med... (read more)

1elifland1y
  I don't think we should update too strongly on these few data points; e.g. a previous analysis of Metaculus' AI predictions [https://forum.effectivealtruism.org/posts/vtiyjgKDA3bpK9E4i/an-examination-of-metaculus-resolved-ai-predictions-and] found "weak evidence to suggest the community expected more AI progress than actually occurred, but this was not conclusive". MATH and MMLU feel more relevant than the average Metaculus AI prediction but not enough to strongly outweigh the previous findings. I'd be interested to check out that dataset! Hard for me to react too much to the strategy without more details, but outside-view-ish reasoning about predicting things far-ish in the future that we don't know much about (and as you say, have often been wrong on the inside view) seems generally reasonable to me. I mentioned in the post that my median is now ~2050 which is 28 years out; as for how I formed my forecast, I originally roughly start with Ajeya's report [https://www.lesswrong.com/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines], added some uncertainty and had previously shifted further out due to intuitions I had about data/environment bottlenecks, unknown unknowns, etc. I still have lots of uncertainty but my median has moved sooner to 2050 due to MATH forcing me to adjust my intuitions some, reflections on my hesitations against short-ish timelines [https://forum.effectivealtruism.org/posts/5hprBzprm7JjJTHNX/reasons-i-ve-been-hesitant-about-high-levels-of-near-ish-ai-1], and Daniel Kokotajlo's work [https://www.lesswrong.com/s/5Eg2urmQjA4ZNcezy/p/HhWhaSzQr6xmBki8F].

Thanks! I just read over it and assuming I understood correctly, this bottleneck primarily happens for "small" operations like layer normalization and softlax, and not for large matrix multiples. In addition, these small operations are still the minority of runtime (40% in their case). So I think this is still consistent with my analysis, which assumes various things will creep in to keep GPU utilization around 40%, but that they won't ever drive it to (say) 10%. Is this correct or have I misunderstood the nature of the bottleneck?

Edit: also maybe we're ju... (read more)

Short answer: If future AI systems are doing R&D, it matters how quickly the R&D is happening.

Okay, thanks! The posts actually are written in markdown, at least on the backend, in case that helps you.

2habryka1y
In that case, if the Markdown dialect matches up, everything might just work fine if you activate the Markdown editor in your Account settings, and then copy-paste the text into the editor (I would try it first in a new post, to make sure it works).

Question for mods (sorry if I asked this before): Is there a way to make the LaTeX render?

In theory MathJax should be enough, eg that's all I use at the original post: https://bounded-regret.ghost.io/how-fast-can-we-perform-a-forward-pass/

2habryka1y
Yeah, sorry. The difference between the rendering systems and your blog is very minor, but has annoying effects in this case. The delimiters we use in HTML are \( and \) instead of the $ on your blog, since that reduces potential errors with people using currency and other similar things. If you submit your HTML with \( and \), then it should render correctly. I also have a short script I can use to fix this, though it currently requires manual effort each time. I might add a special case to your blog or something to change it automatically, though it would probably take me a bit to get around to. Alternatively, if you write your posts in Markdown on your blog, then that would also translate straightforwardly into the right thing here.
3Said Achmiz1y
If you edit the post on GreaterWrong, you should be able to paste in the LaTeX source and have it render, e.g.: ElapsedTime=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩54C2M3: if B>23M√N8C2NB2(M−B/√N): else. (I don’t know how to do it in the Less Wrong editor, but presumably it’s possible there as well.)

I was surprised by this claim. To be concrete, what's your probability of xrisk conditional on 10-year timelines? Mine is something like 25% I think, and higher than my unconditional probability of xrisk.

5Rohin Shah1y
(Ideally we'd be clearer about what timelines we mean here, I'll assume it's TAI timelines for now.) Conditional on 10-year timelines, maybe I'm at 20%? This is also higher than my unconditional probability of x-risk. I'm not sure which part of my claim you're surprised by? Given what you asked me, maybe you think that I think that 10-year timelines are safer than >10-year timelines? I definitely don't believe that. My understanding was that this post was suggesting that timelines are longer than 10 years, e.g. from sentences like this: And that's the part I agree with (including their stated views about what will happen in the next 10 years).

Fortunately (?), I think the jury is still out on whether phase transitions happen in practice for large-scale systems. It could be that once a system is complex and large enough, it's hard for a single factor to dominate and you get smoother changes. But I think it could go either way.

Thanks! I pretty much agree with everything you said. This is also largely why I am excited about the work, and I think what you wrote captures it more crisply than I could have.

Yup, I agree with this, and think the argument generalizes to most alignment work (which is why I'm relatively optimistic about our chances compared to some other people, e.g. something like 85% p(success), mostly because most things one can think of doing will probably be done).

It's possibly an argument that work is most valuable in cases of unexpectedly short timelines, although I'm not sure how much weight I actually place on that.

Note the answer changes a lot based on how the question is operationalized. This stronger operationalization has dates around a decade later.

Yup! That sounds great :)

2Ruby1y
Here it is! https://www.lesswrong.com/s/4aARF2ZoBpFZAhbbe [https://www.lesswrong.com/s/4aARF2ZoBpFZAhbbe] You might want to edit the description and header image.

Thanks Ruby! Now that the other posts are out, would it be easy to forward-link them (by adding links to the italicized titles in the list at the end)?

2Ruby1y
We can also make a Sequence. I assume "More Is Different for AI" should be the title of the overall Sequence too?
2Ruby1y
Done!

@Mods: Looks like the LaTeX isn't rendering. I'm not sure what the right way to do that is on LessWrong. On my website, I do it with code injection. You can see the result here, where the LaTeX all renders in MathJax: https://bounded-regret.ghost.io/ml-systems-will-have-weird-failure-modes-2/

4habryka1y
Yeah, sorry, we are currently importing your post directly as HTML. We don't do code-injection, we figure out what the right HTML for displaying the LaTeX is server-side, and then store that directly in the HTML for the post.  The reason why it isn't working out of the box is that we don't support single-dollar-sign delimiters for LaTeX in HTML, because they have too many false-positives with people just trying to use dollar signs in normal contexts. Everything would actually work out by default if you used the MathJax \( and \) delimiters instead, which are much less ambiguous.  I will convert this one manually for now, not sure what the best way moving forward is. Maybe there is a way you can configure your blog to use the \( and \) delimiters instead, or maybe we can adjust our script to get better at detecting when people want to use the single-dollar-delimiter for MathJax purposes, versus other purposes. 
2delton1371y
I just did some tests... it works if you go to settings and click "Activate Markdown Editor". Then convert to Markdown and re-save (note, you may want to back up before this, there's a chance footnotes and stuff could get messed up).  $stuff$ for inline math and double dollar signs for single line math work when in Markdown mode. When using the normal editor, inline math doesn't work, but $$ works (but puts the equation on a new line). 
3Mark Xu1y
I think latex renders if you're using the markdown editor, but if you're using the other editor then it only works if you use the equation editor.

I feel like you are arguing for a very strong claim here, which is that "as soon as you have an efficient way of determining whether a problem is solved, and any way of generating a correct solution some very small fraction of the time, you can just build an efficient solution that solves it all of the time"

Hm, this isn't the claim I intended to make. Both because it overemphasizes on "efficient" and because it adds a lot of "for all" statements.

If I were trying to state my claim more clearly, it would be something like "generically, for the large majority... (read more)

3habryka1y
I am a bit confused by what we mean by "of the sort you would come across in ML". Like, this situation, where we are trying to derive an algorithm that solves problems without optimizers, from an algorithm that solves problems with optimizers, is that "the sort of problem you would come across in ML"?. It feels pretty different to me from most usual ML problems.  I also feel like in ML it's quite hard to actually do this in practice. Like, it's very easy to tell whether a self-driving car AI has an accident, but not very easy to actually get it to not have any accidents. It's very easy to tell whether an AI can produce a Harry Potter-level quality novel, but not very easy to get it to produce one. It's very easy to tell if an AI has successfully hacked some computer system, but very hard to get it to actually do so. I feel like the vast majority of real-world problems we want to solve do not currently follow the rule of "if you can distinguish good answers you can find good answers". Of course, success in ML has been for the few subproblems where this turned out to be easy, but clearly our prior should be on this not working out, given the vast majority of problems where this turned out to be hard. (Also, to be clear, I think you are making a good point here, and I am pretty genuinely confused for which kind of problems the thing you are saying does turn out to be true, and appreciate your thoughts here)

Thanks for the push-back and the clear explanation. I still think my points hold and I'll try to explain why below.

In order to even get a single expected datapoint of approval, I need to sample 10^8 examples, which in our current sampling method would take 10^8 * 10 hours, e.g. approximately 100,000 years. I don't understand how you could do "Learning from Human Preferences" on something this sparse

This is true if all the other datapoints are entirely indistinguishable, and the only signal is "good" vs. "bad". But in practice you would compare / rank the d... (read more)

2habryka1y
Well, sure, but that is changing the problem formulation quite a bit. It's also not particularly obvious that it helps very much, though I do agree it helps. My guess is even with a rank-ordering, you won't get the 33 bits out of the system in any reasonable amount of time at 10 hours evaluation cost. I do think if you can somehow give more mechanistic and detailed feedback, I feel more optimistic in situations like this, but also feel more pessimistic that we will actually figure out how to do that in situations like this.  I feel like you are arguing for a very strong claim here, which is that "as soon as you have an efficient way of determining whether a problem is solved, and any way of generating a correct solution some very small fraction of the time, you can just build an efficient solution that solves it all of the time".  This sentence can of course be false without implying that the human preferences work is impossible, so there must be some confusion happening. I am not arguing that this is impossible for all problems, indeed ML has shown that this is indeed quite feasible for a lot of problems, but making the claim that it works for all of them is quite strong, but I also feel like it's obvious enough that this is very hard or impossible for a large other class of problems (like, e.g. reversing hash functions), and so we shouldn't assume that we can just do this for an arbitrary problem. 
2habryka1y
I was talking about "costly" in terms of computational resources. Like, of course if I have a system that gets the right answer in 1/100,000,000 cases, and I have a way to efficiently tell when it gets the right answer, then I can get it to always give me approximately always the right answer by just running it a billion times. But that will also take a billion times longer.  In-practice, I expect most situations where you have the combination of "In one in a billion cases I get the right answer and it costs me $1 to compute an answer" and "I can tell when it gets the right answer", you won't get to a point where you can compute a right answer for anything close to $1.

This would imply a fixed upper bound on the number of bits you can produce (for instance, a false negative rate of 1 in 128 implies at most 7 bits). But in practice you can produce many more than 7 bits, by double checking your answer, combining multiple sources of information, etc.

3JBlack1y
Combining multiple source of information, double checking etc are ways to decrease error probability, certainly. The problem is that they're not independent. For highly complex spaces not only does the number of additional checks you need increase super-linearly, but the number of types of checks you need likely possibly also increases super-linearly. That's my intuition, at least.

Maybe, but I think some people would disagree strongly with this list even in the abstract (putting almost no weight on Current ML, or putting way more weight on humans, or something else). I agree that it's better to drill down into concrete disagreements, but I think right now there are implicit strong disagreements that are not always being made explicit, and this is a quick way to draw them out.

Basically the same techniques as in Deep Reinforcement Learning from Human Preferences and the follow-ups--train a neural network model to imitate your judgments, then chain it together with RL.

I think current versions of that technique could easily give you 33 bits of information--although as noted elsewhere, the actual numbers of bits you need might be much larger than that, but the techniques are getting better over time as well.

6habryka1y
Hmm, I don't currently find myself very compelled by this argument. Here are some reasons:  In order to even get a single expected datapoint of approval, I need to sample 10^8 examples, which in our current sampling method would take 10^8 * 10 hours, e.g. approximately 100,000 years. I don't understand how you could do "Learning from Human Preferences" on something this sparse I feel even beyond that, this still assumes that the reason it is proposing a "good" plan is pure noise, and not the result of any underlying bias that is actually costly to replace. I am not fully sure how to convey my intuitions here, but here is a bad analogy: It seems to me that you can have go-playing-algorithms that lose 99.999% of games against an expert AI, but that doesn't mean you can distill a competitive AI that wins 50% of games, even though it's "only 33 bits of information".  Like, the reason why your AI is losing has a structural reason, and the reason why the AI is proposing consequentialist plans also has a structural reason, so even if we get within 33 bits (which I do think seems unlikely), it's not clear that you can get substantially beyond that, without drastically worsening the performance of the AI. In this case, it feels like maybe an AI maybe gets lucky and stumbles upon a plan that solves the problem without creating a consequentialist reasoner, but it's doing that out of mostly luck, not because it actually has a good generator for non-consequentialist-reasoner-generating-plans, and there is no reliable way to always output those plans without actually sampling at least something like 10^4 plans.  The intuition of "as soon as I have an oracle for good vs. bad plans I can chain an optimizer to find good plans" feels far too strong to me in generality, and I feel like I can come up with dozen of counterexamples where this isn't the case. Like, I feel like... this is literally a substantial part of the P vs. NP problem, and I can't just assume my algorithm just li

Yes, I think I understand that more powerful optimizers can find more spurious solutions. But the OP seemed to be hypothesizing that you had some way to pick out the spurious from the good solutions, but saying it won't scale because you have 10^50, not 100, bad solutions for each good one. That's the part that seems wrong to me.

1JBlack1y
Your "harmfulness" criteria will always have some false negative rate. If you incorrectly classify a harmful plan as beneficial one time in a million, in the former case you'll get 10^44 plans that look good but are really harmful for every one that really is good. In the latter case you get 10000 plans that are actually good for each one that is harmful.

That part does seem wrong to me. It seems wrong because 10^50 is possibly too small. See my post Seeking Power is Convergently Instrumental in a Broad Class of Environments:

If the agent flips the first bit, it's locked into a single trajectory. None of its actions matter anymore.

But if the agent flips the second bit – this may be suboptimal for a utility function, but the agent still has lots of choices remaining. In fact, it still can induce  observation histories. If  and , then that's  

... (read more)

I'm not sure I understand why it's important that the fraction of good plans is 1% vs .00000001%. If you have any method for distinguishing good from bad plans, you can chain it with an optimizer to find good plans even if they're rare. The main difficulty is generating enough bits--but in that light, the numbers I gave above are 7 vs 33 bits--not a clear qualitative difference. And in general I'd be kind of surprised if you could get up to say 50 bits but then ran into a fundamental obstacle in scaling up further.

Can you be more concrete about how you would do this? If my method for evaluation is "sit down and think about the consequences of doing this for 10 hours", I have no idea how I would chain it with an optimizer to find good plans even if they are rare.

4JBlack1y
I think the problem is not quite so binary as "good/bad". It seems to be more effective vs ineffective and beneficial vs harmful. The problem is that effective plans are more likely to be harmful. We as a species have already done a lot of optimization in a lot of dimensions that are important to us, and the most highly effective plans almost certainly have greater side effects that make thing worse in dimensions that we aren't explicitly telling the optimizer to care about. It's not so much that there's a direct link between sparsity of effective plans and likelihood of bad outcomes, as that more complex problems (especially dealing with the real world) seem more likely to have "spurious" solutions that technically meet all the stated requirements, but aren't what we actually want. The beneficial effective plans become sparse faster than the harmful effective plans, simply because in a more complex space there are more ways to be unexpectedly harmful than good.

Thanks! Yes, this makes very similar points :) And from 4 years ago!

The fear of anthropomorphising AI is one of the more ridiculous traditional mental blindspots in the LW/rationalist sphere.

You're really going to love Thursday's post :).

Jokes aside, I actually am not sure LW is that against anthropomorphising. It seems like a much stronger injunction among ML researchers than it is on this forum.

I personally am not very into using humans as a reference class because it is a reference class with a single data point, whereas e.g. "complex systems" has a much larger number of data points.

In addition, it seems like intuition ... (read more)

Okay I think I get what you're saying now--more SGD steps should increase "effective model capacity", so per the double descent intuition we should expect the validation loss to first increase then decrease (as is indeed observed). Is that right?

But if you keep training, GD should eventually find a low complexity high test scoring solution - if one exists - because those solutions have an even higher score (with some appropriate regularization term). Obviously much depends on the overparameterization and relative reg term strength - if it's too strong GD may fail or at least appear to fail as it skips the easier high complexity solution stage. I thought that explanation of grokking was pretty clear.

I think I'm still not understanding. Shouldn't the implicit regularization strength of SGD be higher... (read more)

8jacob_cannell1y
I think grokking requires explicit mild regularization (or at least, it's easier to model how that leads to grokking). The total objective is training loss + reg term. Initially the training loss totally dominates, and GD pushes that down until it overfits (finding a solution with near 0 training loss balanced against reg penalty). Then GD bounces around on that near 0 training loss surface for a while, trying to also reduce the reg term without increasing the training loss. That's hard to do, but eventually it can find rare solutions that actually generalize (still allow near 0 training loss at much lower complexity). Those solutions are like narrow holes in that surface. You can run it as long as you want, but it's never going to ascend into higher complexity regions than those which enable 0 training loss (model entropy on order data set entropy), the reg term should ensure that.
2jsteinhardt1y
Okay I think I get what you're saying now--more SGD steps should increase "effective model capacity", so per the double descent intuition we should expect the validation loss to first increase then decrease (as is indeed observed). Is that right?

I'm not sure I get what the relation would be--double descent is usually with respect to the model size (vs. amout of data), although there is some work on double descent vs. number of training iterations e.g. https://arxiv.org/abs/1912.02292. But I don't immediately see how to connect this to grokking.

(I agree they might be connected, I'm just saying I don't see how to show this. I'm very interested in models that can explain grokking, so if you have ideas let me know!)

8jacob_cannell1y
(That arxiv link isn't working btw.) It makes sense that GD will first find high complexity overfit solutions for an overcomplete model - they are most of the high test scoring solution space. But if you keep training, GD should eventually find a low complexity high test scoring solution - if one exists - because those solutions have an even higher score (with some appropriate regularization term). Obviously much depends on the overparameterization and relative reg term strength - if it's too strong GD may fail or at least appear to fail as it skips the easier high complexity solution stage. I thought that explanation of grokking was pretty clear. I was also under the impression that double descent is basically the same thing, but viewed from the model complexity dimension. Initially in the under-parameterized regime validation error decreases with model complexity up to a saturation point just below where it can start to memorize/overfit, then increases up to a 2nd worse overfitting saturation point, then eventually starts to decrease again heading into the strongly overparameterized regime (assuming appropriate mild regularization). In the strongly overparameterized regime 2 things are happening: firstly it allows the model capacity to more easily represent a distribution of solutions rather than a single solution, and it also effectively speeds up learning in proportion by effectively evaluating more potential solutions (lottery tickets) per step. Grokking can then occur, as it requires sufficient overpamaterization (whereas in the underparameterized regime there isn't enough capacity to simultaneously represent a sufficient distribution of solutions to smoothly interpolate and avoid getting stuck in local minima) Looking at it another way: increased model complexity has strong upside that scales nearly unbounded with model complexity, coupled with the single downside of overfitting which saturates at around data memoriation complexity.

I don't think it's inferior -- I think both of them have contrasting strengths and limitations. I think the default view in ML would be to use 95% empiricism, 5% philosophy when making predictions, and I'd advocate for more like 50/50, depending on your overall inclinations (I'm 70-30 since I love data, and I think 30-70 is also reasonable, but I think neither 95-5 or 5-95 would be justifiable).

I'm curious what in the post makes you think I'm claiming philosophy is superior. I wrote this:

> Confronting emergence will require adopting mindsets that are le... (read more)

Also my personal take is that SF, on a pure scientific/data basis, has had one of the best responses in the nation, probably benefiting from having UCSF for in-house expertise. (I'm less enthusiastic about the political response--I think we erred way too far on the "take no risks" side, and like everyone else prioritized restaurants over schools which seems like a clear mistake. But on the data front I feel like you're attacking one of the singularly most reasonable counties in the U.S.)

It seems like the main alternative would be to have something like Alameda County's reporting, which has a couple days fewer lag at the expense of less quality control: https://covid-19.acgov.org/data.page?#cases.

It's really unclear to me that Alameda's data is more informative than SF's. (In fact I'd say it's the opposite--I tend to look at SF over Alameda even though I live in Alameda County.)

I think there is some information lost in SF's presentation, but it's generally less information lost than most alternatives on the market. SF is also backdating th... (read more)

2jsteinhardt1y
Also my personal take is that SF, on a pure scientific/data basis, has had one of the best responses in the nation, probably benefiting from having UCSF for in-house expertise. (I'm less enthusiastic about the political response--I think we erred way too far on the "take no risks" side, and like everyone else prioritized restaurants over schools which seems like a clear mistake. But on the data front I feel like you're attacking one of the singularly most reasonable counties in the U.S.)

Finding the min-max solution might be easier, but what we actually care about is an acceptable solution. My point is that the min-max solution, in most cases, will be unacceptably bad.

And in fact, since min_x f(theta,x) <= E_x[f(theta,x)], any solution that is acceptable in the worst case is also acceptable in the average case.

6davidad1y
Agreed—although optimizing for the worst case is usually easier than optimizing for the average case, satisficing for the worst case is necessarily harder (and, in ML, typically impossible) than satisficing for the average case.

Thanks! I appreciated these distinctions. The worst-case argument for modularity came up in a past argument I had with Eliezer, where I argued that this was a reason for randomization (even though Bayesian decision theory implies you should never randomize). See section 2 here: The Power of Noise.

Re: 50% vs. 10% vs. 90%. I liked this illustration, although I don't think your argument actually implies 50% specifically. For instance if it turns out that everyone else is working on the 50% worlds and no one is working on the 90% worlds, you should probably wo... (read more)

I think this probably depends on the field. In machine learning, solving problems under worst-case assumptions is usually impossible because of the no free lunch theorem. You might assume that a particular facet of the environment is worst-case, which is a totally fine thing to do, but I don't think it's correct to call it the "second-simplest solution", since there are many choices of what facet of the environment is worst-case.

One keyword for this is "partial specification", e.g. here is a paper I wrote that makes a minimal set of statistical assumptions... (read more)

2paulfchristiano1y
Even in ML it seems like it depends on how you formulated your problem/goal. Making good predictions in the worst case is impossible, but achieving low regret in the worst case is sensible. (Though still less useful than just "solve existing problems and then try the same thing tomorrow," and generally I'd agree "solve an existing problem for which you can verify success" is the easiest thing to do.) Hopefully having your robot not deliberately murder you is a similarly sensible goal in the worst case though it remains to be seen if it's feasible.
6davidad1y
My interpretation of the NFL theorems is that solving the relevant problems under worst-case assumptions is too easy, so easy it's trivial: a brute-force search satisfies the criterion of worst-case optimality. So, that being settled, in order to make progress, we have to step up to average-case evaluation, which is harder. (However, I agree that once we already need to do some averaging, making explicit and stripping down the statistical assumptions and trying to get closer to worst-case guarantees—without making the problem trivial again—is harder than just evaluating empirically against benchmarks.)

Cool paper! One brief comment is this seems closely related to performative prediction and it seems worth discussing the relationship.

Edit: just realized this is a review, not a new paper, so my comment is a bit less relevant. Although it does still seem like a useful connection to make.

3David Scott Krueger (formerly: capybaralet)9mo
author here -- Yes we got this comment from reviewers in the most recent round as well.  ADS is a bit more general than performative prediction, since it applies outside of prediction context.  Still very closely related. On the other hand, The point of our work is something that people in the performative prediction community seem to only slowly be approaching, which is the incentive for ADS.  Work on CIDs is much more related in that sense. As a historical note: We starting working on this March or April 2018; Performative prediction was on arXiv Feb 2020, ours was at a safety workshop in mid 2019, but not on arXiv until Sept 2020.

Oh okay got it! It looks like the behavior is as intended, but one downside from my perspective is that the blog link is not very visually prominent as is--I would expect most readers to not notice it. I care about this mostly because I would like more people to know about my blog's existence, and I think it could be fixed if there was the option to add a small avatar next to the blog name to make it more visually prominent (I could imagine lots of other fixes too but just throwing a concrete one out there).

On a separate not it looks like the latex is not ... (read more)

2habryka2y
Yeah, let's also make it a link post then. Some people prefer more prominence, some prefer less, for their cross-posts.
4Ruby2y
I converted the post from the html import in LW Docs editor and manually fixed up the LaTex, which handles it for today.

@LW mods: Looks like this one also doesn't link back to Bounded Regret? Could it be because of the italicized text that I put at the top?

4Ben Pace2y
I'll clarify two things, let me know if your problem is not addressed. For automatic crossposting, the posts link back to the original blog (not blogpost) in the place shown here: Note that this does not appear on mobile, because space is very limited and we didn't figure out how to fit it into the UI. If a person makes a linkpost by adding a link to the small field at the top of the editor, then you get a link to a specific post. That looks like this: This process is not automatic, linkposts are only made manually.

My basic take is that there will be lots of empirical examples where increasing model size by a factor of 100 leads to nonlinear increases in capabilities (and perhaps to qualitative changes in behavior). On median, I'd guess we'll see at least 2 such examples in 2022 and at least 100 by 2030.

At the point where there's a "FOOM", such examples will be commonplace and happening all the time. Foom will look like one particularly large phase transition (maybe 99th percentile among examples so far) that chains into more and more. It seems possible (though not c... (read more)

@LW Mods: It looks like the embedded IFrame from the original post didn't copy over. Is there some way either to embed it here, or else just copy it over as an image? (Also, it looks like this post doesn't actually link back to my blog like it normally does, not sure why...)

4Raemon2y
I added the OP as a linkpost url, and added the iframe as an image. I'll look more into how we handle iframes and see if there's a better option there.

Thanks. For time/brevity, I'll just say which things I agree / disagree with:

> sufficiently capable and general AI is likely to have property X as a strong default [...] 

I generally agree with this, although for certain important values of X (such as "fooling humans for instrumental reasons") I'm probably more optimistic than you that there will be a robust effort to get not-X, including by many traditional ML people. I'm also probably more optimistic (but not certain) that those efforts will succeed.

[inside view, modest epistemology]: I don't have... (read more)

I'm not (retroactively in imaginary prehindsight) excited by this problem because neither of the 2 possible answers (3 possible if you count "the same") had any clear-to-my-model relevance to alignment, or even AGI.  AGI will have better OOD generalization on capabilities than current tech, basically by the definition of AGI; and then we've got less-clear-to-OpenPhil forces which cause the alignment to generalize more poorly than the capabilities did, which is the Big Problem.  Bigger models generalizing better or worse doesn't say anything obvio... (read more)

Not sure if this helps, and haven't read the thread carefully, but my sense is your framing might be eliding distinctions that are actually there, in a way that makes it harder to get to the bottom of your disagreement with Adam. Some predictions I'd have are that:

 * For almost any experimental result, a typical MIRI person (and you, and Eliezer) would think it was less informative about AI alignment than I would.
 * For almost all experimental results you would think they were so much less informative as to not be worthwhile.
 * There's a sma... (read more)

I would agree with you that "MIRI hates all experimental work" / etc. is not a faithful representation of this state of affairs, but I think there is nevertheless an important disagreement MIRI has with typical ML people, and that the disagreement is primarily about what we can learn from experiments.

Ooh, that's really interesting. Thinking about it, I think my sense of what's going on is (and I'd be interested to hear how this differs from your sense):

  1. Compared to the average alignment researcher, MIRI tends to put more weight on reasoning like 'sufficient
... (read more)

Would running the method in this paper on EfficientNet count?

What if we instead used a weaker but still sound method (e.g. based on linear programs instead of semidefinite programs)?

1Zac Hatfield-Dodds2y
On a quick skim it looks like that fails both "not equivalent to executing the model" and the float32 vs R problem. It's a nice approach, but I'd also be surprised if it scales to maintain tight bounds on much larger networks.

It is definitely useful in some settings! For instance it's much easier to collaborate with people not at Berkeley, and in some cases those people have valuable specialized skills that easily outweigh the productivity hit.

I personally have Wednesdays, plus Thursday mornings, as "no meeting days". I think it works pretty well and I know other faculty who do something similar (sometimes just setting mornings as meeting-free). So this does seem like a generally good idea!

Thanks, those are really cool!

I enjoyed this quite a bit. Vision is very important in sports as well, but I hadn't thought to apply it to other areas, despite generally being into applying sports lessons to research (i.e. https://bounded-regret.ghost.io/film-study/).

In sports, you have to choose between watching the person you're guarding and watching the ball / center of play. Or if you're on offense, between watching where you're going and watching the ball. Eye contact is also important for (some) passing.

What's most interesting is the second-level version of this, where good player... (read more)

5alkjash2y
I love the film study post, thanks for linking! This all reminds me of a "fishbowl exercise" they used to run at the MIRI Summer Fellows program, where everyone crowded around for half an hour and watched two researchers do research. I suppose the main worry about transporting such exercises to research is that you end up watching something like this [https://www.youtube.com/watch?v=MgbdMgyKvWE].
Load More