I find trying to find funding or paid roles or even unpaid roles so demoralizing. How do I keep motivated?
I don't want to focus on trying to survey the landscape of funding opportunities and learning to network with people productively. It's so much nicer to just focus on the work I want to be doing, but it seems I either can't make it legible enough fast enough, or it's actually not valuable and I should go do something else with my time.
I want advice. How do I get funding? How do I think about getting funding? How do I stay motivated to keep thinking about how to get funding?
And a different question: How young are you? Are there experienced people who have worked with you and can vouch for the quality of your work / strategic orientation / etc?
I'm 35. You can view my experience on my linkedin profile. I was working as a technologist at an automotive company, involved with some AI projects in collaboration with the Vector Institute. That's when GPT-3 was released, prompting me to take the prosaic scaling hypothesis more seriously and change my plan to saving money so I could finish my CS BSc and change my career goal to working on technical AI alignment.
While completing my BSc I had the opportunity to focus on my NDSP project, first as a directed studies project supervised by George Tzanetakis, and then extend it into an honours project supervised by Teseo Schneider. George is a professor focused on classical AI and music algorithms. Teseo is a professor focused on graphics algorithms. They are probably the most relevant experienced people who have worked with me and could vouch for the quality of my work, but neither is focused on technical alignment so probably cannot vouch for my strategic orientation.
My project was mostly self driven, attempting to extend and apply the tools introduced in Visualizing Neural Networks with the Grand Tour to the same network that was examined in Understanding and controlling a maze-solving policy network as part of a long term plan to first build intuition for interactive n-dimensional tools while applying them to relatively easier to understand image networks, before applying them to relatively more difficult to understand transformer networks.
I have reached out to some of the authors of those papers and have had brief correspondences with Mingwei Li, TurnTrout, peligrietzer, and Ulisse Mini, but I'm unsure how deeply any of them have looked into my work.
Here's the page describing my projects.
I think the NDISP project has the clearest value prospect. I'm currently starting over writing the tool to be stand alone and ready for alpha users. I'd recommend skimming the videos for a sense of the project.
All of my other projects seem to be less legible with much lower probability of much greater usefulness. Charitably I they could be described as working on useful paradigm shifts for the field of AI Alignment and rational global coordination. Less charitably they could be described as a crackpot shouting at clouds. I might describe them as Butterfly Ideas that I really want to get out of the stage of being butterfly ideas, but alas, they keep flapping around me.
My guess is you should get more experience before trying to set your own research directions, especially if they diverge considerably from existing ones. The default is that all research directions are bad, and AI safety is becoming mature enough that good ideas come from experience rather than from first principles. Also in the current environment, automation makes it efficient to execute on good ideas and puts a deadline on gaining experience.
That is commonly given advice, and it makes sense. When you are starting out you don't know what you don't know and can't see the flaws with your own ideas. But on the other hand, coming up with your own ideas is its own skill that may not be trained well by only learning from other peoples experience. It's hard to say. I suppose the obvious ideal is to practice coming up with your own ideas and have experienced mentors to critique them.
What kinds of things do you have in mind when you say "get more experience"? I am applying to fellowships but haven't been accepted to any yet. I don't want to do more ML work that doesn't focus on AI alignment if I can help it. I was considering writing some literature reviews. There are also some papers I would like to try replicating.
But if I'm being honest the things that feels most valuable to me is working on NDISP, OISs, and Maat, or finding other, similar enough projects and contributing to them. I guess I'm gambling with the time I have to focus on these things and I need to accept that if I'm deciding to focus on projects I think will be valuable but other people don't see the value in, then I'll have to keep focusing on them without financial or moral support, and accept the consequences for doing so.
Question to people working on Technical AI Alignment: How are you currently making a living?
I want to, ideally, focus on Technical Alignment Research fulltime and feel a bit lost. Any advice or encouragement, or even discouragement, would be appreciated.
Edit 1: Here is johnswentworth's answer to this question in 2021. I don't know how much the space has changed since then.
Edit 2: Here is Connor Leahy's answer to this question in March 2026. (Present date at time of writing). His answer is pessimistic, but reflects my own understanding of the situation.
How long do you think something should be before it is no longer a quick take and should instead be a top level post? Or is it not about the length? Maybe it's about the amount of research and editing that goes into it?
I don't know the exact cutoff, but I think a decent number of quick takes should just be posts. We already have the Personal Blog tag and I just rely on the moderators deciding if something should be promoted to the frontpage.
I think the question of whether something belongs in shortform vs. a post is separate from whether it's a good post. I was just assuming mods wouldn't promote something they think is too niche or undeveloped. The Personal Blogpost tag specifically calls out "niche topics" and "personal ramblings".
I think I agree. I'm imagining short takes fill the role of quick, twitter like back and forth of idea snippits, and general questions like the one I just posed, but it seems like you can write entire article length content in them which makes me wonder what different people think the distinction should be.
Giving things a title is in my experience a very difficult process and changes the structure of an essay from a conversational open-ended style to a more "I know where I want the reader to go" situation. Both have their place, but they do feel quite different.
Oh, that's a really interesting answer, like the difference is the difference between a named and anonymous function. I do think there is a kind of important semiotic power in naming things, so I can understand wanting to avoid that, but I also feel pretty comfortable writing a post and then slapping a random name on it based on how the vibe turned out. This is what I just did with Ball+Gravity has a "Downhill" Preference. It started as a quick take, but then became a long take, so I copied and pasted it into a post and gave it a name. That's also what inspired this question.
We need better de-politicizing technology. The politicization of issues seems very difficult to avoid. Once something becomes politicized, can it be depoliticized? I think we need jargon and communication norms and platforms that helps push back on the politicization of everything.
Relatedly, the notion that politicization harms peoples ability to form accurate believes and coordinate with one another seems tacitly true to me, but I can't easily find citations that straightforwardly support it, instead, it seems like most research takes it as assumed and then researches specific facets and details. Do you know of any good citations?
Can you expand on what you mean by "politicized" here? Here are some things I think I've heard it used to mean —
I think I should have used the word "polarizing" instead of "politicizing".
I mean the first two also with the implication that people treat these things as quasi-conflicts between quasi-tribes, and so become less likely to focus on what is correct and beneficial and more likely to focus on signalling tribal membership and allegiance.
I think your third bullet point is related, but not necessarily what I'm talking about. Arguing about how society should respond to and think about school shootings is important. School shootings are bad and should be prevented just like traffic accidents and heart disease are bad and should be prevented. I believe responses like gun control are politicized in that people are likely to pattern match "gun control" into a quasi-tribe conflict and then respond accordingly, instead of actually thinking about it, or as should often be done, ignoring if they are not well versed on the relevant issues. But just talking about issues and which parties plan what responses to those issues isn't necessarily a problem, except if it causes people to start contextualizing the issue as a quasi-tribal conflict.
Maybe instead of "politicized" or "polarized" a term like "quasi-tribalized" or "in-group-out-group-conflictized", or something similar but less rhetorically unwieldily.
I think there's a perverse incentive around the learning and usage of math. This is majorly coloured by my experiences in calculus class where it seemed like most students were interested in memorizing how to use the formulas to get the correct answer to get good grades, without necessarily understanding anything about what the calculations they were doing represented or were used for.
Maybe the problem here is fully from Goodharting on grades, but I wonder if there may be a broader phenomenon.
There's three objects here:
The dynamic I'm hypothesizing is:
In general, (2) is more important than (1), and (1) is held in higher esteem than (2). Ideally people have a strong grasp of both (1) and (2), but people are intrinsically and extrinsically motivated to seek (3), and (2) is easier to fake than (1), so people are motivated to fake (2) and put their effort into signalling competence in (1) which would, ideally imply competence in (2), but doesn't necessarily.
I feel this relates to what Grant Sanderson of 3b1b talked about in Math's pedagogical curse. But of course I also worry this is something I'm imagining because I feel motivated to try to understand math that is too difficult for my level of skill and I want to rationalize away my incompetencies. Is it imposter syndrome or honest self knowledge?
What do you think? Is (2) more important than (1), or maybe it's not important to be skilled at both, and we need people better at (1) and people better at (2)? Do you agree the dynamic I described exists, and does it feel common, or marginal? Or maybe my entire framing is flawed in some way?
Ever read Surely You're Joking, Mr. Feynmann? Plenty of the stories in there involve someone (or an entire student body) not really understanding extremely basic things about what the hell they were talking about, despite having memorized some formulas and being adept at manipulating them.
Examples include:
To paraphrase from memory: Most people's knowledge is so fragile!
Yeah, knowledge, especially specialized knowledge, seems tragically fragile. I did read the book many years ago and recall enjoying it. Do you think in those cases:
The problem is not specialized knowledge. All of these examples are basic and fundamental to the subject. Except for the geodesic example[1], I am confident that the other cases are cases of never actually learning what the hell they were talking about, and instead of focusing on that fundamental gap of understanding they instead focused on memorization and rote algebra.
Partly this is perverse incentive of current schools. But I suspect that even if schools didn't actively encourage these problems that you'd still see this, that you need some way to actively fight back. Unfortunately I'm not sure what the key is, here.
Some guesses:
I myself didn't immediately know the answer to that one. However, it had been a while since I was looking into general relativity, and I had not gone that deeply, so I forgot the fact that the (spacetime interval version of) arc length of an observer's path is the proper time of the observer (just because the observer's not moving in their frame) and had to refigure it out (though by then I had read Feynmann's answer).
I anticipate that the assistant would've been able to tell you that fact without connecting it to rocket flights; and I anticipate that I would've, and even that if I had taken the time to write down the problem that I would've had a good chance of figuring out the necessary fact. Why am I like this, when the assistant wasn't?
I think it might be about specialized knowledge, because I see indexing and cross linking things as it's own kind of specialized knowledge. It seems like all of your examples are focused on creating bigger denser networks of cross domain indexing. I think that is great, and I love trying to do it myself! It's highly useful. But is it possible to be useful without it? I can hypothesize that a few people on a team with that skill could make other people that don't have that skill useful...
For example, maybe my co-worker might not understand what Brewster's angle actually means, but if I can rely on them to do the calculations correctly than I can use them to get more work done than I could do alone (hypothetically). If situations like that exist, or are common, then it is actually ok that most students (and employees) are not that interested in actually understanding what the symbolic manipulation they are doing means.
But there are two potential flaws. (1) We are creating more and more capable artificial general intelligence, and doing so may make the people who didn't understand what they were doing deeply enough no longer useful. This is bad under the current prevailing social systems. (2) It might not be the case that schools are actually teaching students anything that is actually useful if the students do not understand it deeply with cross indexing.
(1) is a more general problem... and honestly I'm more worried about misaligned ASI then economic impact, but it is still a pretty important concern.
(2) Is definitely true in some regards. It seems like education does function as a shit test to sort people into social strata, but insofar as it is actually teaching skills that get used, it may be better if emphasis was shifted away from "practice applying specialized skill" and towards "learn dense indexes connecting specialized skills to their applications", trusting that people can look up and reference the specialized skill if they need to apply it, but are much more likely to benefit from knowing which skills exist and where they are useful than to have a bunch of skills that they will forget because they don't understand how those skills connected back to anything real at all.
But I must confess I think (2) is happening somewhat implicitly through the way different communities of different specialized knowledge produce specialists that connect into cross domain teams. I think it would benefit from being made more explicit, but that is probably the sort of thing that sociologists and business management students learn about... I would like to learn more about sociology.
Understanding the symbolic language of math is difficult to master, and easy to evaluate (for someone who has the same skill). That makes it a convenient status marker.
Yeah. I think because of (3) there might be a perverse incentive to seek (1) at the detriment of (2). What do you think?
Yes.
Also, if you learn the symbolic language of math, you join the ranks of people fluent in math.
If you speculate about a purpose, you join a more diffuse set, which includes some deep thinkers but also many crackpots.
So it's like the wisest people are in the latter group, but the people in the former group are smarter on average.
Yeah... it's one of those tricky bayesian updates with a rare phenomenon.
It would be really great if there were easier, cheaper, and more accurate tests to distinguish crackpots from wise people. Or just better methods of dissuading people from becoming crackpots. Then focusing on purpose could signal wisdom without also, more strongly, signalling crackpot.
A contradiction that isn't a contradiction:
I hold both of these views:
Why might this seem like a contradiction? Either I should think that more money should be put into technical AI alignment so I and other people can get paid to do it, or I should conclude that that AI policy is more important and try to work on that instead.
Why do I believe this is not actually a contradiction? In my worldview, AI is a very important and potentially existentially dangerous technology, and the current shepherds of it's development are not handling their responsibility with commensurate wisdom. AI policy is then, the more important and constrained focus. But I do not believe that technical AI alignment should be completely forgotten, and I do not believe I have the aptitude or desire to do policy work. I intrinsically like research and theory building.
I wonder how many other people feel the way I feel. It would be quite a problem if it was the majority of us.
Given the existence of a hill and gravity, the roundness of a ball encodes it's preference for being at the bottom of the hill. Without the hill and gravity, the roundness of the ball could mean many things.
Given the existence of a ball and gravity. the bottom of a hill encodes a preference for where the ball should be.
Neither the ball nor the hill are modelling the world, yet they steer reality towards repeatable outcomes. Is it wrong to call this a preference even though it does not depend on world modelling?
Is it right to say preferences in this context only exist as the interplay of multiple parts of a system? Are the preferences of world modelling agents fully contained in those agents, or, like the ball and the hill, do the agents preferences exist as an interplay between the world and itself?
Not wrong, if used metaphorically, but I think that preference which implies a agent that is aware of and capable of making choices maybe muddies whatever you're trying to express. In the case of the ball and the hill, that is not the case. Preference, in ordinary parlance, suggests the first of options. Often options are qualitative: "I prefer Chocolate Ice-Cream to Strawberry". In Economics it's about "optimal choice" which - again - do the hill and the ball have the capacity to take alternatives? Is there some utility they are maximizing?
Spinoza says that if a stone which has been projected through the air, had consciousness, it would believe that it was moving of its own free will. I add this only, that the stone would be right. The impulse given it is for the stone what the motive is for me, and what in the case of the stone appears as cohesion, gravitation, rigidity, is in its inner nature the same as that which I recognise in myself as will, and what the stone also, if knowledge were given to it, would recognise as will. -
Arthur Schopenhauer
If your objective is to describe the most probable or likely outcome of a system that is better modeled using Daniel Dennett's Physical Stance than his Intentional Stance, then I'd avoid using "preference". In the example you've given, there's nothing to suggest the ball will be anywhere else, there's nothing to suggest it has "options" therefore there are no preferences to speak of.
Preference implies alternative outcomes.
Thanks for engaging : )
I think the phrase "aware of and capable of making choices" hides most of the complexity I am interested in focusing on. What really is awareness? The word "aware" implies that it is a boolean thing, like "either some system is aware or it is not", but I think that's wrong. I think "awareness" varies in amount and kind.
And "making choices" is similarly complicated. The ball could stay put or roll, but it chooses to roll. You could say it never had the choice to do anything but roll because the mechanism which determined its choice to roll, its roundness, is so obvious and exposed, but suppose I understood the mechanisms of some human's mind well enough to predict that human's actions with the same accuracy? Would it be right to suggest that humans do not make choices since the choices were determined by the mechanisms by which humans choose?
It seems to me the Physical Stance and the Intentional Stance both describe the same systems. It is my feeling that in order to understand complex decision making systems, such as humans, AI, and sociotechnical systems, we need to have language that can describe them clearly. So I guess what I might be doing here is trying to force an exploration of the boundary between where the physical and intentional stance apply.
I could believe that a symbolic representation of other objects is the quality required to say that a system is aware, but then, is roundness symbolic? Where is the distinction between symbolic and mechanical?
Likewise, I could very much imagine alternative outcomes are required for preference, but then either some system having preferences depends on how well understood it is. That has uncomfortable implications. If an ASI understood humans sufficiently well, would that ASI be justified in claiming that humans do not have preferences? I'm much more comfortable admitting any system that affects outcomes has preferences than denying the preferences of any sufficiently well understood system.
... Oh, also, I didn't put as much emphasis on it but I really am interested in the question of whether an agent's preferences exist as an interplay between the world and itself. I feel that would have important implications for Agent Foundations and AI Alignment.
but suppose I understood the mechanisms of some human's mind well enough to predict that human's actions with the same accuracy? Would it be right to suggest that humans do not make choices since the choices were determined by the mechanisms by which humans choose?
I don't think we need to suppose... I'd guess you probably do frequently. You have family members, friends, and/or lovers, or people whom you have intimate knowledge and extremely good track records of predicting their behavior?
If an ASI understood humans sufficiently well, would that ASI be justified in claiming that humans do not have preferences? I'm much more comfortable admitting any system that affects outcomes has preferences than denying the preferences of any sufficiently well understood system.
I don't think it would be any more justified claiming that humans don't have preferences than I can claim that anybody I know really well doesn't have preferences. If you can predict which newspaper or soft-drink your Father buys from the store, that doesn't mean he had no choice in the matter. If there's no other newspapers in stock, or only one brand of soft-drink - then he has no choice. But, realistically, you can't choose alternatives you're not aware of.
A simple test of whether something is not a choice or not is to ask: "if the agent believed something else or had very different desires, would the outcome be very different?". If no matter what the agent desires or believes, the outcome would always be the same. Then that's not a choice.
If someone goes up to the fridge at a store and there's a orange drink, and a strawberry drink. And you know they love Orange flavor, and so they buy the orange. that's still a choice. But - imagine you knew they HATED Orange, or if they loved Strawberry instead - hypothetically they would then choose the Strawberry. Therefore it was a choice.
Conversely, imagine a spectator high up on an embankment at a motorrace. They are in a sea of people, a mere spec as seen from the track, so they have no earthly way of affecting the result of the motorrace. There's twenty racers. It doesn't matter who this single spectator desires or wishes to win - the result is hypothetically always the same. This is not a choice.
I am not familiar of any credible model were a ball can "desire" to go up, and contingent on that alone, it does. This is why it is best represented by the "physical" stance in Dennet's typology.
The word "aware" implies that it is a boolean thing, like "either some system is aware or it is not", but I think that's wrong. I think "awareness" varies in amount and kind.
Abstractly, I agree with this, and I think there's a spectrum of awareness in ways that do influence choices. But I'm struggling for examples right now... the best that comes to mind is when a couple are deciding where to go to dinner, and one of them says "let's have Italian" knowing there is an Italian restaurant, they aren't strictly aware of the menu, it could include Ragù, Calzone, Osso Buco or dozens of others choices - but they are aware of at least one restaurant nearby, in their price-range, that does "Italian".
Likewise preferences themselves often exist in parallel. If Orange isn't available, maybe they go for Banana, or Cherry. And likewise choices made are often prompted by complex decision making models that are operating on dozens of different dimensions or factors, even something as simple as buying a shirt - is it comfortable? do I like the pattern or the colour? is the material breathable? What are the washing instructions? etc. etc. etc.
A lot of this is black box analysis. I'm interested in white box analysis. I guess maybe "black box vs white box" means the same thing as "intentional stance vs physical stance".
You speak of knowing the preferences of something, with the implication that you have observed the past behaviour of the system and can infer it's future behaviour based on an abstract model of it's "intentions" or "preferences". Is this what is meant by the "intentional stance"? I think so, and it is indeed a valid way to examine the world.
But within a person, and within an AI model, there is some mechanism that causes those preferences to be so... and that is the kind of understanding I am focusing on. Predicting choice to get Orange flavour, not based on past behaviour involving flavour choices or hearing statements about preferences, but by examining the body and brain and brainstate with enough skill to see how and where the preference for orange is encoded, and predicting based on that. Is this the "physical stance"? In that case I think I might be interested in merging the physical and intentional stance.
For example, I might know that balls roll down hills not because I have analyzed them as physical objects, but because I have observed them roll down hills before. Is this not the same as the intentional stance? Modelling the preferences of the ball based on it's past behaviour?
On the other hand, it isn't too difficult to understand how roundness vs flatness affect rolling. The flat object stays where it is put and the round object rolls down the hill. You can see mechanically why this is the case, but you could also just as well know this by inference and I would suggest that most people learn about physical laws first by observing the behaviours of objects and only later in life learning about things like friction and force and gravity.
I haven't noticed anything you have said that categorically distinguishes the behaviour of an object rolling down a hill from the behaviour of a person expressing their preferences by choosing what they want.
In worlds where leaders X of some social movement condemn taboo action Y you would expect to see X condemning Y. But also in worlds where X supports Y, you would expect them to not be able to publicly state that they support Y, and so in these worlds too, you would expect X to condemn Y. But this creates a problem. If condemning Y is what X does in both worlds where they support and condemn Y, how can X actually communicate with supporters who find themselves wondering if they can support the movement with Y. Do sufficiently strong taboos against Y make it impossible to actually communicate true condemnation clearly? That would suck.
False condemnation: "Yes, of course Y is a bad thing, although it's quite understandable how people may feel driven to do it."
Thought while doing Transformers from Scratch:
Inside a transformer block, the MLP embeds into a higher dimensional space, applies an activation function, and then projects back into a lower dimensional space. From a semantic distribution perspective, here are three intuitions for why this makes sense:
The more dimensions you have the more complicated "knots" you can untangle. With 2d, you can pull the centre out of a line. With 3d, you can pull the centre out of a disk. With 4d, you can pull the centre out of a ball. Etc...
Each dimension in the activation space is like an independent fold applied to the semantic space. The more folds you have the more you can transform the semantics.
The embedding space can be thought of as partitions of independent copies of the input space, each transformed independently. The higher the dimension of the embedding space, the more copies of larger subspaces of the input (including the entire input space) can be independently transformed.