# All of Nathan Helm-Burger's Comments + Replies

DeepMind is hiring for the Scalable Alignment and Alignment Teams

Thanks Rohin. I also feel that interviewing after my 3 more months of independent work is probably the correct call.

2Rohin Shah2d
Update: I think you should apply now and mention somewhere that you'd prefer to be interviewed in 3 months because in those 3 months you will be doing <whatever it is you're planning to do> and it will help with interviewing.
Introduction to Pragmatic AI Safety [Pragmatic AI Safety #1]

This is great! I agree with a lot of what you're saying here and am glad someone is writing these ideas up. Two points of possible disagreement (or misunderstanding on my part perhaps) are:

Highly competitive, pragmatic, no-nonsense culture

I think that competitiveness can certainly be helpful, but too much can be detrimental. Specifically, I think competitiveness needs to go hand-in-hand with cooperation and transparency. Work needs to be shared, and projects need to encompass groups of people. Trying to get the most 'effective research points' from amongst... (read more)

DeepMind is hiring for the Scalable Alignment and Alignment Teams

I'm potentially interested in the Research Engineer position on the Alignment Team, but I'm currently 3 months into a 6 month grant from LTFF to reorient my career from general machine learning to AI safety specifically. My current plan is to keep doing solo work () until the last month of my grant period then begin applying to AI safety work at places like Anthropic, Redwood Research, Open AI, and Deepmind.

Do you think there's a significant advantage to applying soon vs 3 months from now?

4Rohin Shah3d
Looking into it, I'll try to get you a better answer soon. My current best guess is that you should apply 3 months from now. This runs an increased risk that we'll have filled all our positions / closed our applications, but also improved chances of making it through because you'll know more things and be better prepared for the interviews. (Among other things I'm looking into: would it be reasonable to apply now and mention that you'd prefer to be interviewed in 3 months.)

Seems to me that what CLIP needs is a secondary training regime, where it has an image generated as a 2d render of a 3d scene that is generated from a simulator which can also generate a correct and an several incorrect captions. Like: red vase on a brown table (correct), blue vase on a brown table (incorrect), red vase on a green table (incorrect), red vase under a brown table (incorrect). Then do the CLIP training with the text set deliberately including the inappropriate text samples in addition to the usual random incorrect caption samples.  I saw... (read more)

Report likelihood ratios

I understand much better now what you were saying. Thanks for clarifying.

My hope for my personal work is that nibbling away at the mystery with 'prosaic engineering work ' will make the problem clearer such that profound insights will be easier to generate. I think in science generally it is a good heuristic to follow that when no clear theory exists, and gathering more data is an option, then go gather the data. Also, use engineering to build better tools with which to gather new data.

6Richard_Ngo22d
Oh yeah, I totally agree with this. Will edit into the piece.
Report likelihood ratios

Question for clarification: I'm confused by what you mean when you say that the null hypothesis is always false. My understanding is that the null hypothesis is generally "these two sets of data were actually drawn from the same distribution", and the hypothesis is generally "these two sets of data are drawn from different distributions, because the variable of interest which differs between them is in the casual path of a distributional change". Do you have a different way of viewing the null hypothesis in mind? (I totally agree with your conclusion that likelihood ratios should be presented asking with p values, and also ideally other details about the data also.)

5Ege Erdil22d
The null hypothesis in most papers is of the form "some real-valued random variable is equal to zero". This could be an effect size, a regression coefficient, a vector of regression coefficients, et cetera. If that's your null hypothesis then the null hypothesis actually being true is an event of zero probability, in the sense that if your study had sufficient statistical power it would pick up on a tiny signal that would make the variable under consideration (statistically) significantly different from zero. If you believe there are no real-valued variables in the real world then it's merely an event of probability ε>0 where ε is a tiny real number, and this distinction doesn't matter for my purposes. Incidentally, I think this is actually what happened with Bem's parapsychology studies: his methodology was sound and he indeed picked up a small signal, but the signal was of experimenter effects and other problems in the experimental design rather than of paranormal phenomena. My claim is that no matter what you're investigating, a sufficiently powerful study will always manage to pick up on such a small signal. The point is that the null hypothesis being false doesn't tell you much without a useful alternative hypothesis at hand. If someone tells you "populations 1 and 2 have different sample means", the predictive value of that precise claim is nil.
Ideal governance (for companies, countries and more)

That's been tried a few times with some mixed successes and some outright failures. Something that hasn't been thoroughly tried (to my knowledge) is actually making a distributed governance system within / on-top-of existing governments. We can't subtract laws or taxes or currencies, but we can add them. What if we tried making a distributed government with its own currency and governance system and some additional laws. It's a way to test out some theories in a low-risk, low-startup-cost scenario, and could be fun. If the government starts out with a small budget per member, and limited responsibilities, it could be pretty low-pressure to get right on the first try.

Ideal governance (for companies, countries and more)

Well, I don't agree with the underlying argument, but to try to steelman it I'd say something like: Markets are pretty efficient at causing economic growth, causing us to 'grow the pie' (more total absolute wealth for everyone, as opposed to 'sharing the pie' which refers to distributing wealth more evenly), and moving resources around to where they are most needed to allow large complex profit-seeking systems to function. All those little decisions made by selfish semi-rational actors add up to much better decision-making about profitable uses of resource... (read more)

1TAG25d
Ideal governance (for companies, countries and more)

Epistocracy as described by Jason Brennan is similar to things I've been thinking about. Also sortition, which is related but different. Here are some of my thoughts:

If we have a system of 'extrapolating volition' of segments of the population by taking representative samples of people and paying them generously to learn and think and debate on a key issue for a couple of weeks, then give their best full answer as to what should be done... And then also quiz them on their factual understanding of relevant systems, and register their predictions about outco... (read more)

Ideal governance (for companies, countries and more)

Others have mentioned Jason Brennan's work, but I wanted to specifically add this comment he made about practical initial trials of a first step towards epistocracy in his effective altruism AMA.

Are there any small-scale, experiments with Epistocracy you think that countries or other jurisdictions should try as a first stab at testing this form of government? What would you like to see and where?

7

I'd like to try enlightened preference voting in Denmark or New Hampshire.

How it works:
1. Everyone votes for their p

1Nathan Helm-Burger1mo
Epistocracy as described by Jason Brennan is similar to things I've been thinking about. Also sortition, which is related but different. Here are some of my thoughts: If we have a system of 'extrapolating volition' of segments of the population by taking representative samples of people and paying them generously to learn and think and debate on a key issue for a couple of weeks, then give their best full answer as to what should be done... And then also quiz them on their factual understanding of relevant systems, and register their predictions about outcomes... I think this is in some sense a more 'fair' look at what that segment of the population would think if given plenty of time to think. Then from there we can make a model which lets us average these opinions and extrapolate (compensate for biases in the predictions, etc.), but in order to check that we've 'extrapolated well' we should check our extrapolations with different members of that population segment. This process can be repeated several times until we get good agreement on the extrapolation. I think extrapolating is a good idea, but I worry it would be vulnerable to abuse if you didn't then have the step of the people who are being extrapolated for approving of the extrapolation. Also, I have some thoughts along the same lines of Jason Brennan's criticism of democracy as being 'a system of pushing around the minority.' Growing up in the liberal Quaker tradition, I've spent a lot of time in groups of people trying to do decision making via consensus. The quaker consensus process allows for some small portion of the group to 'stand aside' to allow a decision to go through that they disagree with, but this is uncommon. When it does happen, it's usually less that 5% of the group. It occurs more often in groups of over >200 people than in the more typical consensus-seeking groups of 20-100 people. I feel like the insights from this process for me are that it doesn't scale well, and takes a lot of eff
What Would A Fight Between Humanity And AGI Look Like?

As someone who was recently the main person doing machine learning research & development in a 'data driven company', I can confirm that we were working as hard as we could to replace human decision making with machine learning models on every level of the business that we could. It worked better, made more money, more reliably, with fewer hours of human work input. Over the years I was there we gradually scaled down the workforce of human decision makers and scaled up the applications of ML and each step along that path was clearly a profit win. Money... (read more)

What Would A Fight Between Humanity And AGI Look Like?

You get most of your learning from experiences? I sure don't. I get most of mine from reading, and I expect an AGI even close to human-level will also be able to learn from the logical abstractions of the books it reads. I think what you're saying would be true if we agreed to not train AI models on text, but only on things like toy physical models. But currently, we're feeding in tons of data meant to educated humans about the world, like textbooks on every subject and scientific papers, and all of wikipedia, and personal stories from Reddit and... everyt... (read more)

3tailcalled1mo
Examples of things that are right around me right now that I've not learned through reading: door, flask, lamps, tables, chairs, honey, fridge, .... I've definitely learned a lot from reading, though typically even when reading about stuff I've learned even more by applying what I've read in practice, as words don't capture all the details [http://johnsalvatier.org/blog/2017/reality-has-a-surprising-amount-of-detail].
What Would A Fight Between Humanity And AGI Look Like?

I think that by the time we have a moderately-superhuman agentive unaligned AGI, we're in deep trouble. I think it's more interesting and useful to stay focused on the time leading up to that which we must cross through to get to the deep trouble.

Particularly, I am hopeful that there will be some level of sub-human AGI (either sub-human intelligence or sub-human-speed) which tries some of the things we predict might happen, like a deceptive turn, but underestimates us, and we catch it in time. Or we notice the first crazy weird bad thing happen, and ... (read more)

The Scale Problem in AI

I think any AI sufficiently good at world modeling to be much of a threat is also going to be quite good at learning from abstractions made from others. In other words, it can just read a history book to figure out about political change, it doesn't need years of 'personal' experience. It's pretty standard to give models access to training data encompassing a ton of detail about the world nowadays. Just because they currently don't grok the physics textbooks, medical papers, and wikipedia entries doesn't mean that that limitation will remain true.

2tailcalled1mo
I agree that there is a strong incentive towards making AI capable of understanding these things. The interesting question is going to be, how quickly will that succeed, and what will the properties of the kinds of models that succeed in this be?
Are deference games a thing?

Yeah, I think the idea needs some careful thought to flesh it out a bit more. If voting is not anonymous and scheduled, then it's too easy for coercion and such to enter the picture. On the plus side, once you do have something solid enough to test, you can start with local elections. For instance, look at the Center for Election Science and their work on converting local elections to approval voting.

Convince me that humanity *isn’t* doomed by AGI

I think I am more optimistic than most about the idea that experimental evidence and iterative engineering of solutions will become increasingly effective as we approach AGI-level tech. I place most of my hope for the future on a sprint of productivity around alignment near the critical juncture. I think it makes sense that this transition period between theory and experiment feels really scary to the theorists like Eliezer. To me, a much more experimentally minded person it feels like a mix of exciting tractability and looming danger. To put it poetically... (read more)

Convince me that humanity *isn’t* doomed by AGI

I think I am more optimistic than most about the idea that experimental evidence and iterative engineering of solutions will become increasingly effective as we approach AGI-level tech. I place most of my hope for the future on a sprint of productivity around alignment near the critical juncture. I think it makes sense that this transition period between theory and experiment feels really scary to the theorists like Eliezer. To me, a much more experimentally minded person it feels like a mix of exciting tractability and looming danger.

To put it poetically,... (read more)

Clem's Memo

The call to world leaders to update their conceptions of the strategic landscape based on a game-changing new technology seems incredibly relevant to the impending development of AGI. And also, in a sadly retro way, relevant to the resurgence of nuclear world war risk from Russia's recent aggression.

Feature proposal: Close comment as resolved

I think this is a nice idea, especially if it's just a karma-sorting effect, and it's either author-only or implements the author-can-undo system mentioned by Viliam. But actually, I came here to say that I'm even more excited for a feature which is nearly the opposite: I want comments to be able to be tagged by the author as 'ToDo'. This would let readers know that the author has seen the comment and intends to respond, and the author could have a 'ToDo' page which listed their marked Todo flagged comments either chronologically or karmically. You could even allow readers to up/down vote the ToDo flag to help the author prioritize their responses.

Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon

Ok, I want to say thank you for this comment because it contains a lot of points I strongly agree with. I think the alignment community needs experimental data now more than it needs more theory. However, I don't think this lowers my opinion of MIRI. MIRI, and Eliezer before MIRI even existed yet, was predicting this problem accurately and convincingly enough that people like myself updated. 15 years ago I began studying neuroscience, neuromorphic computing, and machine learning because I believed this was going to become a much bigger deal than it was the... (read more)

3shminux1mo
Hmm, I agree that Eliezer, MIRI and its precursors did a lot of good work raising the profile of this particular x-risk. However, I am less certain of their theoretical contributions, which you describe as I guess they did highlight a lot of dead ends, gotta agree with that. I am not sure how much the larger AI/ML community values their theoretical work. Maybe the practitioners haven't caught up yet. Well, whatever the fraction, it certainly seems like it's time to rebalance it, I agree. I don't know if MIRI has the know-how to do experimental work at the level of the rapidly advancing field.
Refine: An Incubator for Conceptual Alignment Research Bets

I'm really excited for the outcomes you describe: more relentlessly resourceful independent researchers exploring a wider range of options. I do feel a bit concerned that your search for good applicants is up against a challenge. I think that both the intelligence necessary to produce good results and the personality trait of agentiveness such that they can become relentlessly resourceful with training are rare and largely determined early in life. And I think that, given this, a lot of such people will already be quite absorbed in profitable paths by the ... (read more)

What more compute does for brain-like models: response to Rohin

Ah, gotcha. I predict yes, with quite high confidence (like 95%), for 12 OOMs and using the Blue Brain Project. The others I place only small confidence in (maybe 5% each). I really think the BBP has enough detail in its model to make something very like a human neocortex, and capable of being an AGI, if scaled up.

What more compute does for brain-like models: response to Rohin

I wrote this specifically aimed at the case of "in the thought experiment where humanity got 12 orders of magnitude more compute this year, what would happen in the next 12 months?" I liked the post that Daniel wrote about that and wanted to expand on it. My claim is that even if everything that was mentioned in that post was tried and failed, that there would still be these things to try. They are algorithms which already exist, which could be scaled up if we suddenly had an absurd amount of compute. Not all arguments about why standard approaches like Transformers fail also apply to these alternate approaches.

1Rohin Shah1mo
Right, but I don't yet understand what you predict happens. Let's say we got 12 OOMs of compute and tried these things. Do we now have AGI? I predict no.
What more compute does for brain-like models: response to Rohin

I don't think they are better representations of general intelligence. I'm quite confident that much better representations of general intelligence exist and just have yet to be discovered. I'm just saying that these are closer to a proven path, and although they are inefficient and unwise, somebody would likely follow these paths if suddenly given huge amounts of compute this year. And in that imaginary scenario, I predict they'd be pretty effective.

My reasoning for saying this for the Blue Brain Project is that I've read a lot of their research papers, a... (read more)

Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain

In a similar conversation about non-main-actor paths to dangerous AI I came up with this as an example of a path I can imagine being plausible and dangerous: A plausible-to-me worst case scenario would be something like:
A phone-scam organization employs someone to build them a online-learning reinforcement learning agent (using an open-source language model as a language-understanding-component) that functions as a scam-helper. It takes in the live transcription of the ongoing conversation between a scammer and a victim, and gives the scammer suggestions f... (read more)

What more compute does for brain-like models: response to Rohin

I was using flp as an abbreviation. And I'll read Joe Carlsmith's report and then let you know what I think.

edit: Oh yeah, and one thing to keep in mind is these are estimates for if we suddenly had a shockingly big jump in amount of compute (12 orders of magnitude) but no time to develop or improve existing algorithms. So my estimates for 'what could a reasonably well engineered algorithm, that had been tested and iterated on at scale, do?' would be much much lower. This is stupidly wasteful upper bound.

What more compute does for brain-like models: response to Rohin

Yes, I'll fix that link [edit: fixed]. I have not yet thought hard about failure modes and probabilities for these cases. I can work on that and let you know what I come up with.

What more compute does for brain-like models: response to Rohin

I think spiking neural nets are at least 1, probably more like 2 OOMs more compute intensive to train, similarly effective, somewhat more efficient at learning from data. I think Numenta is probably even harder to train and even more data efficient. I can certainly test these hypotheses at small scale. I'll let you know what I find.

2Rohin Shah1mo
If you think spiking neural nets are more compute intensive, then why does this matter? It seems like we'd just get AGI faster with regular neural nets? (I think compute is more likely to be the bottleneck than data, so the data efficiency doesn't seem that relevant.) Perhaps you think that if we use spiking neural nets, then we only need to train it for 15 human-years-equivalent to get AGI (similar to the Lifetime anchor in the bio anchors report [https://www.lesswrong.com/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines] ), but that wouldn't be true if we used regular neural nets? Seems kinda surprising. Maybe you think that the Lifetime anchor in bio anchors is the best anchor to use and so you have shorter timelines?
Fun with +12 OOMs of Compute

Blue Brain does actually have a human brain model waiting in the wings, it just tries to avoid mentioning that. A media-image management thing. I spent the day digging into your question about OOMs, and now have much more refined estimates. Here's my post: https://www.lesswrong.com/posts/5Ae8rcYjWAe6zfdQs/what-more-compute-does-for-brain-like-models-response-to

Fun with +12 OOMs of Compute

Ok, as a former neuroscientist who has spent a lot of years (albeit not recent ones) geeking out about, downloading, and playing with various neural models, I'd like to add to this discussion. First, the worm stuff seems overly detailed and focused on recreating the exact behavior rather than 'sorta kinda working like a brain should'. A closer, more interesting project to look at (but still too overly specific) is the Blue Brain project [ https://www.epfl.ch/research/domains/bluebrain/ ]. Could that work with 12 more OOMs of compute? I feel quite confident... (read more)

5Rohin Shah1mo
(I know ~nothing about any of this, so might be misunderstanding things greatly) 12 OOMs is supposed to get us human-level AGI, but BlueBrain seems to be aiming at a mouse brain? "It takes 12 OOMs to get to mouse-level AGI" seems like it's probably consistent with my positions? (I don't remember the numbers well enough to say off the top of my head.) But more fundamentally, why 12 OOMs? Where does that number come from? From a brief look at the website, I didn't immediately see what cool stuff Nengo could do with 2019 levels of compute, that neural networks can't do. Same for Numenta.
A Brief Excursion Into Molecular Neuroscience

Absolutely. In fact, I think the critical impediment to machine learning being able to learn more useful things from the current amassed neuroscience knowledge is: "but which of these many complicated bits are even worth including in the model?" There's just too much, and so much is noise, or incompletely understood such that our models of it are incomplete enough to be worse-than-useless.

A concrete bet offer to those with short AI timelines

Since my goal is to convince people that I take my beliefs seriously, and this amount of money is not actually going to change much about how I conduct the next three years of my life, I'm not worried about the details. Also, I'm not betting that there will be a FOOM scenario by the conclusion of the bet, just that we'll have made frightening progress towards one.

A Brief Excursion Into Molecular Neuroscience

Oh man, I spent so many years of grad school looking up these acronyms and handwriting their full names into the papers I was reading, until I memorized enough of them. The acronym soup is silly, and the formal paper language which ends up obscuring the true confidence level of the observations. So much overstatement of limited evidence... Still, I like this stuff. One thing I'd like to add is that when working with complicated interaction mechanisms like this that aren't fully known, I find it super helpful to run computer simulations of competing hypothe... (read more)

2Jan1mo
Yes, I agree, a model can really push intuition to the next level! There is a failure mode where people just throw everything into a model and hope that the result will make sense. In my experience that just produces a mess, and you need some intuition for how to properly set up the model.
What an actually pessimistic containment strategy looks like

I just expect much better outcomes from the path of close collaboration. To me it seems like I am a "nuclear power plant safety engineer", and they are the "nuclear power plant core criticality engineers". Preventing the power plant from getting built would mean it wasn't built unsafely, but since I believe that would leave a void in which someone else would be strongly inclined to build their version... I just see the best likelihood of good outcome in the paths near 'we work together to build the safest version we can'.

A concrete bet offer to those with short AI timelines

Nice specific breakdown! Sounds like you side with the authors overall. Want to also make the 3:1 bet with me?

1Lorenzo Rex1mo
Thanks. Yes, pretty much in line with the authors. Btw, I would super happy to be wrong and see advancement in those areas, especially the robotic one. Thanks for the offer, but I'm not interested in betting money.
Finally Entering Alignment

My thought, as a researcher who is pretty good at roughshod programming but not so good at rock-solid tested-everything programming, is that programming/engineering is big. Focusing on a specific aspect that is needed and also interesting to you might be advantageous, like supercomputing / running big spark clusters or security / cryptography.

What an actually pessimistic containment strategy looks like

I think the framing of "convince leading AI researchers to willingly work more closely with AI alignment researchers, and think about the problem more themselves" is the better goal. I don't think hampering them generally is particularly useful/effective, and I don't think convincing them entirely to "AGI is very scary" is likely either.

3lc1mo
This is such a weird sentiment to me. I hear it a lot, and predicated on similar beliefs about AGI I feel like it's missing a mood. If someone were missassembling a bomb inside a children's hospital, would you still say "hampering them generally isn't particularly useful/effective"? Would your objective be to get them to "think more about the problem themselves"? There's a lack of urgency in these suggestions that seems unnatural. The overarching instrumental goal is to get them to stop.
A concrete bet offer to those with short AI timelines

Ok, I take your bet for 2030. I win, you give me $1000. You win, I give you$3000. Want to propose an arbiter? (since someone else also took the bet, I'll get just half the bet, their $500 vs my$1500)

Shouldn't it be: 'They pay you $1,000 now, and in 3 years, you pay them back plus$3,000' (as per Bryan Caplan's discussion in the latest 80k podcast episode)? The money won't do anyone much good if they receive in it a FOOM scenario.

Google's new 540 billion parameter language model

That's already what TPUs do, basically

Fixing The Good Regulator Theorem

Trying to think about this from more of a practical machine learning standpoint, but without full understanding of all the points you made...

I think in the case where X, Y, and S are all partially but not fully known, and you have to choose model M so as to minimize regret over time. Two things occur to me as possibly useful strategies for choosing M. You might find the opportunity to run 'experiments', to choose suboptimal output R at an early timestep such that from then on you'd have a better understanding of S, and be able to better refine M for ... (read more)

[RETRACTED] It's time for EA leadership to pull the short-timelines fire alarm.

About 15 years ago, before I'd started professionally studying and doing machine learning research and development, my timeline had most of its probability mass around 60 - 90 years from then. This was based on my neuroscience studies and thinking about how long I thought it would take to build a sufficiently accurate emulation of the human brain to be functional. About 8 years ago, studying machine learning full time, AlphaGo coming out was inspiration for me to carefully rethink my position, and I realized there were a fair number of shortcuts off my lon... (read more)

Project Intro: Selection Theorems for Modularity

I recall something similar about a robot hand trained in varying simulations. I remember an OpenAI project not a Deepmind one...  Here's the link to the OpenAI environment-varying learner: https://openai.com/blog/learning-dexterity/

2Pattern1mo
I mixed up deepmind and openai.
the scaling “inconsistency”: openAI’s new insight

I think there's something we could do even beyond choosing the best of existing data points to study. I think we could create data-generators, which could fill out missing domains of data using logical extrapolations. I think your example is a great type of problem for such an approach.

How can a layman contribute to AI Alignment efforts, given shorter timeline/doomier scenarios?

Well, I don't think that focusing on the most famous slightly-ahead organizations is actually all that useful. I'd expect that the next-best-in-line would just step forward. Impeding data centers around the world would likely be more generally helpful. But realistically for an individual, trying to be helpful to the AI safety community in a non-direct-work way is probably your best bet at contributing.

1lc1mo
DeepMind is helping every other organization out by publishing research. It's much more of a direct impediment to hamper deepmind than I think you're expecting.
Progress Report 2

Looking at a couple hundred of these plots for the MLP neurons, I see two obvious patterns which can occur alone or together/overlapping. One pattern is a 'vertical group', a narrow band that runs through many layers. The other pattern is a 'horizontal group', which is lots of involvement within one layer.

Now I'm generating and looking at plots which do both the MLP neurons and the attention heads.

Procedurally evaluating factual accuracy: a request for research

I have a friend who has been working on a team doing automatic factual responses to search queries. I'll send him the link to this article and maybe he'll have some thoughts...