JamesH — LessWrong

A Proposal for a Better ARENA: Shifting from Teaching to Research Sprints

Sure, I agree with most of that. I think this is probably mostly based on counterfactuals being hard to measure, in two senses:

The first is the counterfactual where participants aren't selected for ARENA, do they then go on to do good things. We've taken a look at this (unpublished) and found that for people who are on the margin attendance at ARENA has an effect. But then that effect could be explained by signaling value. It's basically difficult to say. This is why we try and do start-of-program and end-of-program surveys to measure this. But different viewpoints are available here because it is difficult to measure definitively.
The second is the counterfactual where people spend 4 weeks doing research sprints. I basically do expect that to be more effective if you require the ARENA materials as prerequisites, but I think it would then be hard to actually get applicants to such a programme (since people generally struggle to work through ARENA materials themselves). But maybe something else could work here. I actually kind of expect the counterfactual of that to be pretty low due to margin-based reasoning, where there exist many research-oriented programmes already, but relatively fewer upskilling-oriented programmes. But again, difficult to know definitively what's more valuable on current margins (though I do think on current margins is the relevant question).

My guess is these are the two cruxes? But unsure.

A Proposal for a Better ARENA: Shifting from Teaching to Research Sprints

JamesH2d70

I think this is a relatively interesting idea as a stand-alone thing (i.e. running more week-long research sprints). This seems good and like it would be pretty useful in the AI safety community, and is underserved at the moment.

I don't really think the framing of this as an alternative to ARENA makes much sense. Because ARENA's main bottleneck to scaling hasn't really been TAs. I mean it's a bottleneck, don't get me wrong, but I wouldn't call it the main one. I also think having a less-structured, more research-driven model would probably require more TA involvement? If you wanted to do it well and have it be accessible to early-stage people, at least.

I'm confused about the evidence given that ARENA functions primarily as a signaling mechanism. I think that is certainly an aspect of ARENA (as with MATS, as with the completion of any high-quality program). But the fact that some people have done AI safety research before ARENA is weak evidence of this to me (I could go into more detail about this in private, but not willing to do so publicly since it pertains to ARENA's selection process which I do not want to be gamed).

The fact that people from compressed versions of ARENA (ARBOx, which is a great program!) also go on to do great things doesn't seem like evidence of this to me. In fact, this seems like evidence that completing a structured curriculum just makes you a better applicant to other programmes. Not sure how this is being interpreted (since this depends on what you perceive to be the signaling value of ARBOx, which I think is probably not as high as ARENA's. I think it should be higher because it's a great programme!)

Also we do put out impact reports where people self-assess as having improved their ability to do certain concrete tasks that we think are valuable in research. Won't go into it in detail here because we've done so in impact reports in the past. E.g. The report for ARENA 6 is here https://www.lesswrong.com/posts/WiergDX4ufcLbvwzs/arena-6-0-impact-report (Side note that I would love to see more reports like this from other programs in the ecosystem, and on a more regular basis).

I have more thoughts but will leave it there. Think this project is probably worth pursuing independently (although if you're getting early-stage people to do research-sprints, as I said earlier, I think you do need high-quality and involved mentorship that looks similar to TAs). Also think there's a lot of people doing somewhat similar things, but maybe not quite 1-week sprints.

Paranoia: A Beginner's Guide

JamesH2mo30

You know that a paper is good if it gets rejected both for being "trivial" and "obviously incorrect"

Another great example from Tarski, Frechet, and Lebesgue here!

ARENA 6.0 - Call for Applicants

JamesH7mo10

I'm sorry it took you so long to fill out! I hope you're correct that you were slower than the median by a decent amount, since I don't want the application to take up quite so much of people's time. However, definitely appreciate you letting us know how long it took you.

We want to try and get a sense of people's experience in AI safety, future career plans, and see how they engage with AI safety material through the application (as well as their technical experience, since it can be a pretty technically demanding course), as well as a bunch of logistical stuff, of course.

I find that we tend to get a lot of signal from virtually all the parts of the application (apart from some of the logistical stuff, but I imagine that stuff is relatively quick to fill out). We have thought about trying to cut it down somewhat, but found it difficult to remove anything.

ARENA 5.0 - Call for Applicants

JamesH11mo10

We do want the participants on ARENA to have quite a strong interest in AI safety, which is why we ask people to evidence some substantial engagement with AI safety agendas (which is what that question is designed to do). However, we're not looking for perfect answers for either of the questions, nothing that should take hours of research if you're engaged consistently on LessWrong/Alignment Forum.

However, it's not that you must finish the application within 60-90 minutes, this is just a rough estimate of how long it would take someone who's engaged with AI and AI safety to complete it (which may be wrong, sorry that this was the case!). We aren't presuming in this estimate that people are doing a lot of research to provide the highest quality answers to these questions, since that's really not what we're expecting. Although of course you're free to spend as much or as little time as you want on the application.

Singular learning theory: exercises

JamesH1y20

I think there's a mistake in 17: \sin(x) is not a diffeomorphism between (-\pi,\pi) and (-1,1) (since it is e.g. not bijective between these domains). Either you mean sin(x/2) or the interval bounds should be (-\pi/2, \pi/2)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0

JamesH2y20

ARENA might end up teaching this person some mech-interp methods they haven't seen before, although it sounds like they would be more than capable of self-teaching any mech-interp. The other potential value-add for your acquaintance would be if they wanted to improve their RL or Evals skills, and have a week to conduct a capstone project with advisors. If they were mostly aiming to improve their mech-interp ability by doing ARENA, there would probably be better ways to spend their time.

Project proposal: Testing the IBP definition of agent

JamesH3y51

The way we see this project going concretely looks something like:

First things first, we want to get a good enough theoretical background of IBP. This will ultimately result in something like a distillation of IBP that we will use as reference, and hope others will get a lot of use from.

In this process, we will be doing most of our testing in a theoretical framework. That is to say, we will be constructing model agents and seeing how InfraBayesian Physicalism actually deals with these in theory, whether it breaks down at any stage (as judged by us), and if so whether we can fix or avoid those problems somehow.

What comes after this, as we see it at the moment, is trying to implement the principles of InfraBayesian Physicalism in a real-life, honest-to-god, Inverse Reinforcment Learning proposal. We think IBP stands a good chance of being able to patch some of the largest problems in IRL, which should ultimately be demonstrable by actually making an IRL proposal that works robustly. (When this inevitably fails the first few times, we will probably return to step 1, having gained useful insights, and iterate).

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments