mesaoptimizer

https://mesaoptimizer.com

 

learn math or hardware

Wiki Contributions

Comments

I searched for it and found none. The twitter conversation also seems to imply that there has not been a paper / technical report out yet.

Based on your link, it seems like nobody even submitted anything to the contest throughout the time it existed. Is that correct?

I expect that Ryan means to say one of the these things:

  1. There isn't enough funding for MATS grads to do useful work in the research directions they are working on, that have already been vouched for by senior alignment researchers (especially their mentors) to be valuable. (Potential examples: infrabayesianism)
  2. There isn't (yet) institutional infrastructure to support MATS grads to do useful work together as part of a team focused on the same (or very similar) research agendas, and that this is the case for multiple nascent and established research agendas. They are forced to go to academia and disperse across the world instead of being able to work together in one location. (Potential examples: selection theorems, multi-agent alignment (of the sort that Caspar Oesterheld and company work on))
  3. There aren't enough research managers in existing established alignment research organizations or frontier labs to enable MATS grads to work on the research directions they consider extremely high value, and would benefit from multiple people working together on (Potential examples: activation steering)

I'm pretty sure that Ryan does not mean to say that MATS grads cannot do useful work on their own. The point is that we don't yet have the institutional infrastructure to absorb, enable, and scale new researchers the way our civilization has for existing STEM fields via, say, PhD programs or yearlong fellowships at OpenAI/MSR/DeepMind (which are also pretty rare). AFAICT, the most valuable part of such infrastructure in general is the ability to co-locate researchers working on the same or similar research problems -- this is standard for academic and industry research groups, for example, and from experience I know that being able to do so is invaluable. Another extremely valuable facet of institutional infrastructure that enables researchers is the ability to delegate operations and logistics problems -- particularly the difficulty of finding grant funding, interfacing with other organizations, getting paperwork handled, etc.

I keep getting more and more convinced, as time passes, that it would be more valuable for me to work on building the infrastructure to enable valuable teams and projects, than to simply do alignment research while disregarding such bottlenecks to this research ecosystem.

I've become somewhat pessimistic about encouraging regulatory power over AI development recently after reading this Bismarck Analysis case study on the level of influence (or lack of it) that scientists had over nuclear policy.

The impression I got from some other secondary/tertiary sources (specifically the book Organizing Genius) was that General Groves, the military man who was the interface between the military and Oppenheimer and the Manhattan Project, did his best to shield the Manhattan Project scientists from military and bureaucratic drudgery, and that Vannevar Bush was someone who served as an example of a scientist successfully steering policy.

This case study seems to show that Groves was significantly less of a value add than I thought given the likelihood of him having destroyed Leo Szilard's political influence (and therefore Leo's ability to influence nuclear policy in a direction of preventing an arms race or using it in war). Bush also seems like a disappointment -- he waited months for information to pass through 'official channels' before he attempted to persuade people like FDR to begin a nuclear weapons development program. On top of that, Bush seemed like he internalized the bureaucratic norms of the political and military hierarchy he worked in -- when a scientist named Ernest Lawrence tried to reach the relevant government officials to talk about the importance of nuclear weapons development, Bush (according to this paper) got annoyed by him seemingly bypassing the 'chain of command' (I assume by focusing on talking to people Bush would report to, instead of to Bush himself) that he threatened to politically marginalize Ernest.

Finally, I see clear parallels between the ineffective attempts by these physicists at influencing nuclear weapons policy via contributing technically and trying to build 'political capital', and the ineffective attempts by AI safety engineers and researchers who decide to go work at frontier labs (OpenAI is the clearest example) with the intention of building influence with the people in there so that they can steer things in the future. I'm pretty sure at this point that such a strategy is a pretty bad idea, given that it seems better to do nothing than to contribute to accelerating towards ASI.

There are galaxy-brained counter-arguments to this claim, such as davidad's supposed game-theoretic model that (AFAICT) involves accelerating to AGI powerful enough to make the provable safety agenda viable, or Paul Christiano's (again, AFAICT) plan that's basically 'given intense economic pressure for better capabilities, we shall see a steady and continuous improvement, so the danger actually is in discontinuities that make it harder for humanity to react to changes, and therefore we should accelerate to reduce compute overhang'. I remain unconvinced by them.

I’m optimizing for consistently writing and publishing posts.

I agree with this strategy, and I plan to begin something similar soon. I forgot that Epistemological Fascinations is your less polished and more "optimized for fun and sustainability" substack. (I have both your substacks in my feed reader.)

I really appreciate this essay. I also think that most of it consists of sazens. When I read your essay, I find my mind bubbling up concrete examples of experiences I've had, that confirm or contradict your claims. This is, of course, what I believe is expected from graduate students when they are studying theoretical computer science or mathematics courses -- they'd encounter an abstraction, and it is on them to build concrete examples in their mind to get a sense of what the paper or textbook is talking about.

However, when it comes to more inchoate domains like research skill, such writing does very little to help the inexperienced researcher. It is more likely that they'd simply miss out on the point you are trying to tell them, for they haven't failed both by, say, being too trusting (a common phenomenon) and being too wary of 'trusting' (a somewhat rare phenomenon for someone who gets to the big leagues as a researcher). What would actually help is either concrete case studies, or a tight feedback loop that involves a researcher trying to do something, and perhaps failing, and getting specific feedback from an experienced researcher mentoring them. The latter has an advantage that one doesn't need to explicitly try to elicit and make clear distinctions of the skills involved, and can still learn them. The former is useful because it is scalable (you write it once, and many people can read it), and the concreteness is extremely relevant to allowing people to evaluate the abstract claims you make, and pattern match it to their own past, current, or potential future experiences.

For example, when reading the Inquiring and Trust section, I recall an experience I had last year where I couldn't work with a team of researchers, because I had basically zero ability to defer (and even now as I write this, I find the notion of deferring somewhat distasteful). On the other hand, I don't think there's a real trade-off here. I don't expect that anyone needs to naively trust that other people they are coordinating with will have their back. I'd probably accept the limits to coordination, and recalibrate my expectations of the usefulness of the research project, and probably continue if the expected value of working on the project until it is shipped is worth it (which in general it is).

When reading the Lightness and Diligence section, I was reminded of the Choudhuri 1985 paper, which describes the author's notion of a practice of "partial science", that is, an inability to push science forward due to certain systematic misconceptions of how basic (theoretical physics, in this context) science occurs. One misconception involves a sort of distaste around working on 'unimportant' problems, or problems that don't seem fundamental, while only caring about or willing to put in effort to solve 'fundamental' problems. The author doesn't make it explicit, but I believe that he believed that the incremental work that scientists do is almost essential for building their knowledge and skill to make their way forwards towards attacking these supposedly fundamental problems, and the aversion to working on supposedly incremental research problems leads people to being stuck. This seems very similar to the thing you are pointing at when you talk about diligence and hard work being extremely important. The incremental research progress, to me, seems similar to what you call 'cataloguing rocks'. You need data to see a pattern, after all.

This is the sort of realization and thinking I wouldn't have if I did not have research experience or did not read relevant case studies. I expect that Mesa of early 2023 would have mostly skimmed and ignored your essay, simply because he'd scoff at the notion of 'Trust' and 'Lightness' being relevant in any way to research work.

GPT-4o can not reproduce the string, and instead just makes up plausible candidates. You love to see it.

Hmm. I assume you could fine-tune away an LLM from reproducing the string. Eliciting it would just become more difficult. Try posting canary text, and a part of the canary string, and see if GPT-4o completes it.

Load More