How to turn money into AI safety?

by Charlie Steiner12 min read25th Aug 202126 comments

63

Effective AltruismCommunityAI
Frontpage

Related: Suppose $1 billion is given to AI Safety. How should it be spent? , EA is vetting-constrained, What to do with people?

I

I have heard through the grapevine that we seem to be constrained - there's money that donors and organizations might be happy to spend on AI safety work, but aren't because of certain bottlenecks - perhaps talent, training, vetting, research programs, or research groups are in short supply. What would the world look like if we'd widened some of those bottlenecks, and what are local actions that people can do to move in that direction? I'm not an expert either from the funding or organizational side, but hopefully I can leverage Cunningham's law and get some people more in the know to reply in the comments.

Of the bottlenecks I listed above, I am going to mostly ignore talent. IMO, talented people aren't the bottleneck right now, and the other problems we have are more interesting. We need to be able to train people in the details of an area of cutting-edge research. We need a larger number of research groups that can employ those people to work on specific agendas. And perhaps trickiest, we need to do this within a network of reputation and vetting that makes it possible to selectively spend money on good research without warping or stifling the very research it's trying to select for.

In short, if we want to spend money, we can't just hope that highly-credentialed, high-status researchers with obviously-fundable research will arise by spontaneous generation. We need to scale up the infrastructure. I'll start by taking the perspective of individuals trying to work on AI safety - how can we make it easier for them to do good work and get paid?

There are a series of bottlenecks in the pipeline from interested amateur to salaried professional. From the the individual entrant's perspective, they have to start with learning and credentialing. The "obvious path" of training to do AI safety research looks like getting a bachelor's or PhD in public policy, philosophy, computer science, or math, (for which there are now fellowships, which is great) trying to focus your work towards AI safety, and doing a lot of self-study on the side. These programs are often an imprecise fit for the training we want - we'd like there to be graduate-level classes that students can take that cover important material in AI policy, technical alignment research, the philosophy of value learning, etc.

Opportunity 1: Develop course materials and possibly textbooks for teaching courses related to AI safety. This is already happening somewhat. Encourage other departments and professors to offer courses covering these topics.

Even if we influence some parts of academia, we may still have a bottleneck where there aren't enough departments and professors who can guide and support students focusing on AI safety topics. This is especially relevant if we want to start training people fast, as in six months from now. To bridge this gap this it would be nice to have training programs, admitting people with bachelor's- or master's-level skills, at organizations doing active AI safety research. Like a three-way cross between internship, grad school, and AI Safety Camp. The intent is not just to have people learn and do work, but also to help them produce credible signals of their knowledge and skills, over a timespan of 2-5 years. Not just being author number 9 out of 18, but having output that they are primarily responsible for. The necessity of producing credible signals of skill makes a lot of sense when we look at the problem from the funders' perspective later.

Opportunity 2: Expand programs located at existing research organizations that fulfill training and signalling roles. This would require staff for admissions, support, and administration.

This would also provide an opportunity for people who haven't taken the "obvious path" through academia, of which there are many in the AI safety community, who otherwise would have to create their own signalling mechanisms. Thus it would be a bad outcome if all these internships got filled up with people with ordinary academic credentials and no "weirdness points," as admissions incentives might push towards. Strong admissions risk-aversion may also indicate that we have lots of talent, and not enough spots (more dakka required).

Such internships would take nontrivial effort and administrative resources - they're a negative for the research output of the individuals who run them. To align the incentives to make them happen, we'd want top-down funding intended for this activity. This may be complicated by the fact that a lot of research happens within corporations, e.g. at DeepMind. But if people actually try, I suspect there's some way to use money to expand training+signalling internships at corporate centers of AI safety research.

Suppose that we blow open that bottleneck, and we have a bunch of people with some knowledge of cutting-edge research, and credible signals that they can do AI safety work. Where do they go?

Right now there are only a small number of organizations devoted to AI safety research, all with their own idiosyncrasies, and all accepting only a small number of new people. And yet we want most research to happen in organizations rather than alone: Communicating with peers is a good source of ideas. Many projects require the efforts or skillsets of multiple people working together. Organizations can supply hardware, administrative support, or other expertise to allow research to go smoother.

Opportunity 3: Expand the size and scope of existing organizations, perhaps in a hierarchical structure. Can't be done indefinitely (will come back to this), but I don't think we're near the limits.

In addition to increasing the size of existing organizations, we could also found new groups altogether. I won't write that one down yet, because it has some additional complications. Complications that are best explored from a different perspective.

II

If you're a grant-making organization, selectivity is everything. Even if you want to spend more money, if you offer money for AI safety research but have no selection process, a whole bushel of people are going to show up asking for completely pointless grants, and your money will be wasted. But it's hard to filter for people and groups who are going to do useful AI safety research.

So you develop a process. You look at the grantee's credentials and awards. You read their previous work and try to see if it's any good. You ask outside experts for a second opinion, both on the work and on the grantee themselves. Et cetera. This is all a totally normal response to the need to spend limited resources in an uncertain world. But it is a lot of work, and can often end up incentivizing picking "safe bets."

Now let's come back the unanswered problem of increasing the number of research organizations. In this environment, how does that happen? The fledgling organization would need credentials, previous work, and reputation with high-status experts before ever receiving a grant. The solution is obvious: just have a central group of founders with credentials, work, and reputation ("cred" for short) already attached to them.

Opportunity 4: Entice people who have cred to found new organizations that can get grants and thus increase the amount of money being spent doing work.

This suggests that the number of organizations can only grow exponentially, through a life cycle where researchers join a growing organization, do work, gain cred, and then bud off to form a new group. Is that really necessary, though? What if a certain niche just obviously needs to be filled - can you (assuming you're Joe Schmo with no cred) found an organization to fill it? No, you probably cannot. You at least need some cred - though we can think about pushing the limits later. Grant-making organizations get a bunch of bad requests all the time, and they shouldn't just fund all of them that promise to fill some niche. There are certainly ways to signal that you will do a good job spending grant money even if you utterly lack cred, but those signals might take a lot of effort for grant-making organizations to interpret and compare to other grant opportunities, which brings us to the "vetting" bottleneck mentioned at the start of the post. Being vetting-constrained means that grant-making organizations don't have the institutional capability to comb through all the signals you might be trying to send, nor can they do detailed follow-up on each funded project sufficient to keep the principal-agent problem in check. So they don't fund Joe Schmo.

But if grant-making orgs are vetting-constrained, why can't they just grow? Or if they want to give more money and the number of research organizations with cred is limited, why can't those grantees just grow arbitrarily?

Both of these problems are actually pretty similar to the problem of growing the number of organizations. When you hire a new person, they need supervision and mentoring from a person with trust and know-how within your organization or else they're probably going to mess up, unless they already have cred. This limits how quickly organizations can scale. Thus we can't just wait until research organizations are most needed to grow them - if we want more growth in the future we need growth now.

Opportunity 5: Write a blog post urging established organizations to actually try to grow (in a reasonable manner), because their intrinsic growth rate is an important limiting factor in turning money into AI safety.

All of the above has been in the regime of weak vetting. What would change if we made grant-makers' vetting capabilities very strong? My mental image of strong vetting is grant-makers being able to have a long conversation with an applicant, every day for a week, rather than a 1-hour interview. Or being able to spend four days of work evaluating the feasibility of a project proposal, and coming back to the proposer with a list of suggestions to talk over. Or having the resources to follow up on how your money is being spent on a weekly basis, with a trusted person available to help the grantee or step in if things aren't going to plan. If this kind of power was used for good, it would open up the ability to fund good projects that previously would have been lost in the noise (though if used for ill it could be used to gatekeep for existing interests). This would decrease the reliance on cred and other signals, and increase the possible growth rate, closer to the limits from "talent" growth.

An organization capable of doing this level of vetting blurs the line between a grant-making organization and a centralized research hub. In fact, this fits into a picture where research organizations have stronger vetting capabilities for individuals than grant-making organizations do for research organizations. In a growing field, we might expect to see a lot of intriguing but hard-to-evaluate research take place as part of organizations but not get independently funded.

Strong vetting would be impressive, but it might not be as cost-effective as just lowering standards, particularly for smaller grants. It's like a stock portfolio - it's fine to invest in lots of things that individually have high variance so long as they're uncorrelated. But a major factor in how low your standards can be is how well weak vetting works at separating genuine applicants from frauds. I don't know much about this, so I'll leave this topic to others.

The arbitrary growth of research organizations also raises some questions about research agendas (in the sense of a single, cohesive vision). A common pattern of thought is that if we have more organizations, and established organisms have different teams of people working under their umbrellas, then all these groups of people need different things to do, and that might be a bottleneck. That what's best is when groups are working towards a single vision, articulated by the leader, and if we don't have enough visions we shouldn't found more organizations.

I think this picture makes a lot of sense for engineering problems, but not a lot of sense for blue-sky research. Look at the established research organizations - FHI, MIRI, etc. - they have a lot of people working on a lot of different things. What's important for a research group is trust and synergy; the "top-down vision" model is just a special case of synergy that arises when the problem is easily broken into hierarchical parts and we need high levels of interoperability, like an engineering problem. We're not at that stage yet with AI safety or even many of its subproblems, so we shouldn't limit ourselves to organizations with single cohesive visions.

III

Let's flip the script one last time - if you don't have enough cred to do whatever you want, but you think we need more organizations doing AI safety work, is there some special type you can found? I think the answer is yes.

The basic ingredient is something that's both easy to understand and easy to verify. I'm staying at the EA Hotel right now, so it's the example that comes to mind. The concept can be explained in about 10 seconds (it's a hotel that hosts people working on EA causes), and if you want me to send you some pictures I can just as quickly verify that (wonder of wonders) there is a hotel full of EAs here. But the day-to-day work of administrating the hotel is still nontrivial, and requires a small team funded by grant money.

This is the sort of organization that is potentially foundable even without much cred - you promise something very straightforward, and then you deliver that thing quickly, and the value comes from its maintenance or continuation. When I put it that way, now maybe it sounds more like Our World In Data's covid stats. Or like 80kh's advising services. Or like organizations promising various meta-level analyses, intended for easy consmption and evaluation by the grant-makers themselves.

Opportunity 6: If lacking cred, found new organizations with really, extremely legible objectives.

The organization-level corollary of this is that organizations can spend money faster if they spend it on extremely legible stuff (goods and services) rather than new hires. But as they say, sometimes things that are expensive are worse. Overall this post has been very crassly focusing on what can get funded, not what should get funded, but I can be pretty confident that researchers give a lot more bang per buck than a bigger facilities budget. Though perhaps this won't always be true; maybe in the future important problems will get solved, reducing researcher importance, while demand for compute balloons, increasing costs.

I think I can afford to be this crass because I trust the readers of this post to try to do good things. The current distribution of AI safety research is pretty satisfactory to me given what I perceive to be the constraints, we just need more. It turned out that when I wrote this post about the dynamics of more, I didn't need to say much about the content of the research. This isn't to say I don't have hot takes, but my takes will have to stay hot for another day.

Thanks for reading.

Thanks to Jason Green-Lowe, Guillaume Corlouer, and Heye Groß for feedback and discussion at CEEALAR.

63

26 comments, sorted by Highlighting new comments since Today at 6:05 AM
New Comment

A lot of the difficulty comes from the fact that AI safety is a problem we don't understand; the field is pre-paradigmatic. We don't know how best to frame the problem. We don't know what questions to ask, what approximations to make, how to break the problem into good subproblems, what to pay attention to or what to ignore. All of these issues are themselves major open problems.

That makes a lot of the usual scaling-up approaches hard.

  • We can't write a textbook, because we don't know what needs to go in it. The one thing we know for sure is that the things we might currently think to write down are not sufficient; we do not yet have all the pieces.
  • We can't scale up existing training programs, because we don't quite know what skills/knowledge are crucial for AI safety research. We do know that no current program trains quite the right mix of skills/knowledge; otherwise AI safety would already fit neatly into that paradigm.
  • Existing organizations have limited ability to absorb more people, because they don't understand the problem well enough to effectively break it into pieces which can be pursued in parallel. Figuring that out is part of what existing orgs are trying to do.
  • Previous bullet also applies to people who have some legible achievements and could found a new org.
  • Finally, nobody currently knows how to formulate the core problems of the field in terms of highly legible objectives. Again, that's a major open problem.

I learned about the abundance of available resources this past spring. My own approach to leveraging more resources is to try to scale up the meta-level skills of specializing in problems we don't understand. That's largely what the framing practicum material is for - this is what a "textbook" looks like for fields where we don't yet know what the textbook should contain, because figuring out the right framing tools is itself part of the problem.

I think if you're in the early stages of a big project, like founding a pre-paradigmatic field, it often makes sense to be very breadth-first. You can save a lot of time trying to understand the broad contours of solution space before you get too deeply invested in a particular approach.

I think this can even be seen at the microscale (e.g. I was coaching someone on how to solve leetcode problems the other day, and he said my most valuable tip was to brainstorm several different approaches before exploring any one approach in depth). But it really shines at the macroscale ("you built entirely the wrong product because you didn't spend enough time talking to customers and exploring the space of potential offerings in a breadth-first way").

One caveat is that breadth-first works best if you have a good heuristic. For example, if someone with less than a year of programming experience was practicing leetcode problems, I wouldn't emphasize the importance of brainstorming multiple approaches as much, because I wouldn't expect them to have a well-developed intuition for which approaches will work best. For someone like that, I might recommend going depth-first almost at random until their intuition is developed (random rollouts in the context of monte carlo tree search are a related notion). I think there is actually some psych research showing that more experienced engineers will spend more time going breadth-first at the beginning of a project.

A synthesis of the above is: if AI safety is pre-paradigmatic, we want lots of people exploring a lot of different directions. That lets us understand the broad contours better, and also collects data to help refine our intuitions.

IMO the AI safety community has historically not been great at going breadth-first, e.g. investing a lot of effort in the early days into decision theory stuff which has lately become less fashionable. I also think people are overconfident in their intuitions about what will work, relative to the amount of time which has been spent going depth-first and trying to work out details related to "random" proposals.

In terms of turning money into AI safety, this strategy is "embarrassingly parallel" in the sense that it doesn't require anyone to wait for a standard textbook or training program, or get supervision from some critical person. In fact, having a standard curriculum or a standard supervisor could be counterproductive, since it gets people anchored on a particular frame, which means a less broad area gets explored. If there has to be central coordination, it seems better to make a giant list of literatures which could provide insight, then assign each literature to a particular researcher to acquire expertise in.

After doing parallel exploration, we could do a reduction tree. Imagine if we ran an AI safety tournament where you could sign up as "red team", "blue team", or "judge". At each stage, we generate tuples of (red player, blue player, judge) at random and put them in a video call or a Google Doc. The blue player tries to make a proposal, the red player tries to break it, the judge tries to figure out who won. Select the strongest players on each team at each stage and have them advance to the next stage, until you're left with the very best proposals and the very most difficult to solve issues. Then focus attention on breaking those proposals / solving those issues.

I learned about the abundance of available resources this past spring.

I'm curious what this is referring to.

There's apparently a lot of funding looking for useful ways to reduce AI X-risk right now.

Yes, I agree, but I think people still have lots of ideas about local actions that will help us make progress. For example, I have empirical questions about GPT-2 / 3 that I don't have the time to test right now. So I could supervise maybe one person worth of work that just consisted of telling them what to do (though this hypothetical intern should also come up with some of their own ideas). I could not lay out a cohesive vision for other people to follow long-term (at least not very well), but as per my paragraph on cohesive visions, I think it suffices for training to merely have spare ideas lying around, and it suffices for forming an org to merely be fruitful to talk to.

I agree with the bit in the post about how it makes sense to invest in a lot of different approaches by different small teams. Similarly with hiring people to work on various smaller/specific questions. This makes sense at small scale, and there's probably still room to scale it up more at current margins. The problem comes when one tries to pour a lot of money into that sort of approach: spending a lot of money on something is applying optimization pressure, whether we intend to or not, and if we don't know what we're optimizing for then the default thing which happens is that we Goodhart on people trying to look good to whoever's making the funding decisions.

So, yes at small scale and probably at current margins, but this is a strategy which can only scale so far before breaking down.

My Gordon Worley impression: If we don't have a fraud problem, we're not throwing around enough money :P

Fraud also seems like the kind of problem you can address as it comes up. And I suspect just requiring people to take a salary cut is a fairly effective way to filter for idealism.

All you have to do to distract fraudsters is put a list of poorly run software companies where you can get paid more money to work less hard at the top of the application ;-) How many fraudsters would be silly enough to bother with a fraud opportunity that wasn't on the Pareto frontier?

lol this does sound exactly like something I would say!

The problem comes when one tries to pour a lot of money into that sort of approach

It seems to me that the Goodhart effect is actually stronger if you're granting less money.

Suppose that we have a population of people who are keen to work on AI safety. Suppose every time a person from that population gets an application for funding rejected, they lose a bit of the idealism which initially drew them to the area and they start having a few more cynical thoughts like "my guess is that grantmakers want to fund X, maybe I should try to be more like X even though I don't personally think X is a great idea."

In that case, the level of Goodharting seems to be pretty much directly proportional to the number of rejections -- and the less funding available, the greater the quantity of rejections.

On the other hand, if the United Nations got together tomorrow and decided to fund a worldwide UBI, there'd be no optimization pressure at all, and people would just do whatever seemed best to them personally.

EDIT: This appears to be a concrete example of what I'm describing

Another implication John didn't list, is that a certain kind of illegible talent, the kind that can make progress in pre-paradigmatic fields, is crucial. This seems to strongly conflict with the statement in your post:

>Of the bottlenecks I listed above, I am going to mostly ignore talent. IMO, talented people aren't the bottleneck right now, and the other problems we have are more interesting. We need to be able to train people in the details of an area of cutting-edge research. We need a larger number of research groups that can employ those people to work on specific agendas. And perhaps trickiest, we need to do this within a network of reputation and vetting that makes it possible to selectively spend money on good research without warping or stifling the very research it's trying to select for.

Do you think that special sort of talent doesn't exist? Or is abundant? Or isn't the right way to understand the situation? Or what?

The option I continue to be in favor of is just give away more money with less vetting and double down on a hits based approach. Yes, we have to do some minimal vetting to make sure that the work is actually about safety and not capabilities and that the money isn't just being grabbed by scammers, but otherwise if someone honestly wants to work on AI safety, even if it seems dumb or unlikely to work to you, just give them the money and let's see what they produce.

I think we worry way too much about reputation concerns. These seem hypothetical to me, and if we just fund a lot of work some of it will be great and rise to the top, the rest will be mediocre and forgotten or ignored.

I've tried to do what I can in my own way to fund things that seem unlikely to get money from others or that have struggled to get funding, but I have limited resources, so I can only do so much. I hope others will step up and do the same.

I think we worry way too much about reputation concerns. These seem hypothetical to me, and if we just fund a lot of work some of it will be great and rise to the top, the rest will be mediocre and forgotten or ignored.

I think you're overconfident that mediocre work will be "forgotten or ignored". We don't seem to have reliable metrics for measuring the goodness of alignment work. We have things like post karma and what high-status researchers are willing to say publicly about the work, but IMO these are not reliable metrics for the purpose of detecting mediocre work. (Partially due to Goodhart's law; people who seek funding for alignment research probably tend to optimize for their posts getting high karma and their work getting endorsements from high-status researchers). FWIW I don't think the reputation concerns are merely hypothetical at this point.

I guess, but I think of this as weighing two options:

  • worry about reputation and err towards false negatives in funding
  • don't worry about reputation and err towards false positives in funding

I want to increase the chance that AI safety is addressed, and allowing more false positives in funding (since I assess most false positives to produce neutral outcomes relative to achieving AI safety) seems the better trade-off, so all else equal I prefer to worry less about reputation.

I think there are some arguments that reputation matters long term, but it's not clear we have long enough for that to matter or that funding more stuff would actually hurt reputation, so lacking better arguments I remain convinced we should just fund more stuff more freely.

I've been thinking about how LessWrong might help solve some of the problems you're describing. In particular, I wonder if we can set up programs/mechanisms that would: (a) help new promising people begin making real contributions (not just study/prep) even before they join research groups,  and (b) thereby allow them to gain cred that leads to being hired, getting grants, collaboration, etc.

If anyone has thoughts on how else LessWrong help in space, please respond here or message me.

Up until recently I was interning at Non-Linear. A lot of our tasks were just gathering information on various topics. I imagine that it might be useful to have people interning at LW, working on improving the Wiki. I don't know how much credibility this would provide, but my intuition is that an internship is more prestigious than volunteering as internships are generally selective and require a greater time commitment.

My own take (which might be heavily wrong, and I would appreciate any good counterargument):

There is a massive mismatch between the incentives of the field and the incentives of people who could do the field building. Everyone constantly says that they want textbooks and courses and distillation, but I think any researcher without a stable job is really incentivized against that.

Why? Basically because if you want funding for research and/or a research job in alignment, you need good research. I have never seen a single example of good field building deciding such a case over research, and so even if you got kudos, you don't get anything in return for field building (in terms of being able to work on alignment). Worse, you probably spent a shit ton of time on it, which makes you have less research to show for your grant application or job application, and so reduces significantly your chances of getting funding/jobs to keep working on alignment.

This line of thinking has heavily limited my attempts to do anything to solve the problems of the field. Any time I have a promising idea, I assumes it's going to be a net negative on everything related to getting funding/jobs for me, and just see if I can get away with that cost. That has been my reasoning behind the alignment coffee time, mentoring people, and accepting some responsibilities. But I never maintain any of the bigger and important projects and ideas because doing so would torpedo my chances of ever doing alignment research again. Even when I think that they are probably the most important things for the future of alignment in general.

Thinking about it some more, I believe the actual problem I'm pointing out is not that people can't find initial funding (I said elsewhere and still believe it's quite easy to get one) but that we don't really have a plan for longterm funding outside of funding labs. What I mean is that if you're like me and have some funding for a year or two, either you're looking for a job in one of the big labs or... you don't really know. Maybe it's possible to get funding every year for ten years? But that's also very stressful, and not clear whether it is possible.

The related problem is that funding projects is fundamentally a bad idea if there are only few projects that require these skills but the skills take time and investment to get. Only thinking in terms of funding project means that you expect someone to stop what they're doing for a year, do that project, and then go back to what they were doing. Except they probably need at least a year of study before being good enough to do it, and also these two years spent on this can cost you basically all you chances of doing anything in your previous career. (And maybe they just want to keep helping)

A last framing I find both sad and hilarious is that any independent researcher that wants to do field building is basically on the driver side of Parfit's hitchhiker: they can give a ride to the hitchhiker in the desert (the alignment community in need of community) if the hitchhiker pays them when they arrive into town, but it looks pretty clear from the driver's seat that the incentives of the hitchhiker are for not giving the money at that point because it would be wasteful. Hence why the alignment community is unable to get out of the desert.

In safety research labs in academe, we do not have a resource edge compared to the rest of the field.

We do not have large GPU clusters, so we cannot train GPT-2 from scratch or fine-tune large language models in a reasonable amount of time.

We also do not have many research engineers (currently zero) to help us execute projects. Some of us have safety projects from over a year ago on the backlog because there are not enough reliable people to help execute the projects.

These are substantial bottlenecks that more resources could resolve.

Of the bottlenecks I listed above, I am going to mostly ignore talent. IMO, talented people aren't the bottleneck right now, and the other problems we have are more interesting.

 

Can you clarify what you mean by this? I see two main possibilities for what you might mean:

  • There are many talented people who want to work on AI alignment, but are doing something else instead.
  • There are many talented people working on AI alignment, but they're not very productive.

If you mean the first one, I think it would be worth it to survey people who are interested in AI alignment but are currently doing something else -- ask each of them, why aren't they working on AI alignment? Have they ever applied for a grant or job in the area? If not, why not? Is money a big concern, such that if it were more freely available they'd start working on AI alignment independently? Or is it that they'd want to join an existing org, but open positions are too scarce?

I think that "There are many talented people who want to work on AI alignment, but are doing something else instead." is likely to be true. I met at least 2 talented people who tried to get into AI Safety but who weren't able to because open positions / internships were too scarce. One of them at least tried hard (i.e applied for many positions and couldn't find one (scarcity), despite the fact that he was one of the top french students in ML). If there was money / positions, I think that there are chances that he would work on AI alignment independently.
Connor Leahy in one of his podcasts mentions something similar aswell.

That's the impression I have.

I want to point out that cashing out "talented" might be tricky. My observation is that talent for technical alignment work is not implied/caused by talent in maths and/or ML. It's not bad to have any of this, but I can think of many incredible people in maths/ML I know who seem way less promising to me than some person with the right mindset and approach.

Yeah, I mean the first. Good survey question ideas :)

Make thousands of clones of John von Neumann.  Yes, first we will have to get some of his DNA and we will have to figure out exactly how to clone humans.  But the payoff, at least over a 30-year-time scale, seems much bigger than anything else proposed.  On a less ambitious but related front, identify profoundly gifted kids between, say, 8-16 and offer them free expert tutoring in return for some of that tutoring being about the importance and challenges of AI safety.  The Davidson Institute would be a good place to start to look for such kids.

This makes me wonder if causes trying to recruit very smart kids is already a problem that organizations chosen by parents of smart kids view as an annoyance, and have adaptations against.

Unlikely.  The Davidson Institute does not get asked to recruit such children, and they would be the easiest pathway to such recruitment in the US.