TristanTrim's Shortform

TristanTrim

TristanTrim's Shortform — LessWrong

TristanTrim's Shortform

23rd Feb 2026

1 min read

4

This is a special post for quick takes by TristanTrim. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

38 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:02 AM

[-]TristanTrim1mo152

I find trying to find funding or paid roles or even unpaid roles so demoralizing. How do I keep motivated?

I don't want to focus on trying to survey the landscape of funding opportunities and learning to network with people productively. It's so much nicer to just focus on the work I want to be doing, but it seems I either can't make it legible enough fast enough, or it's actually not valuable and I should go do something else with my time.

I want advice. How do I get funding? How do I think about getting funding? How do I stay motivated to keep thinking about how to get funding?

[-]Eli Tyre1mo60

What work are you doing? Is any of it publicly viewable?

[-]Eli Tyre25d30

And a different question: How young are you? Are there experienced people who have worked with you and can vouch for the quality of your work / strategic orientation / etc?

[-]TristanTrim24d*10

I'm 35. You can view my experience on my linkedin profile. I was working as a technologist at an automotive company, involved with some AI projects in collaboration with the Vector Institute. That's when GPT-3 was released, prompting me to take the prosaic scaling hypothesis more seriously and change my plan to saving money so I could finish my CS BSc and change my career goal to working on technical AI alignment.

While completing my BSc I had the opportunity to focus on my NDSP project, first as a directed studies project supervised by George Tzanetakis, and then extend it into an honours project supervised by Teseo Schneider. George is a professor focused on classical AI and music algorithms. Teseo is a professor focused on graphics algorithms. They are probably the most relevant experienced people who have worked with me and could vouch for the quality of my work, but neither is focused on technical alignment so probably cannot vouch for my strategic orientation.

My project was mostly self driven, attempting to extend and apply the tools introduced in Visualizing Neural Networks with the Grand Tour to the same network that was examined in Understanding and controlling a maze-solving policy network as part of a long term plan to first build intuition for interactive n-dimensional tools while applying them to relatively easier to understand image networks, before applying them to relatively more difficult to understand transformer networks.

I have reached out to some of the authors of those papers and have had brief correspondences with Mingwei Li, TurnTrout, peligrietzer, and Ulisse Mini, but I'm unsure how deeply any of them have looked into my work.

[-]TristanTrim25d10

Here's the page describing my projects.

I think the NDISP project has the clearest value prospect. I'm currently starting over writing the tool to be stand alone and ready for alpha users. I'd recommend skimming the videos for a sense of the project.

All of my other projects seem to be less legible with much lower probability of much greater usefulness. Charitably I they could be described as working on useful paradigm shifts for the field of AI Alignment and rational global coordination. Less charitably they could be described as a crackpot shouting at clouds. I might describe them as Butterfly Ideas that I really want to get out of the stage of being butterfly ideas, but alas, they keep flapping around me.

[-]Thomas Kwa25d21

My guess is you should get more experience before trying to set your own research directions, especially if they diverge considerably from existing ones. The default is that all research directions are bad, and AI safety is becoming mature enough that good ideas come from experience rather than from first principles. Also in the current environment, automation makes it efficient to execute on good ideas and puts a deadline on gaining experience.

[-]TristanTrim25d10

That is commonly given advice, and it makes sense. When you are starting out you don't know what you don't know and can't see the flaws with your own ideas. But on the other hand, coming up with your own ideas is its own skill that may not be trained well by only learning from other peoples experience. It's hard to say. I suppose the obvious ideal is to practice coming up with your own ideas and have experienced mentors to critique them.

What kinds of things do you have in mind when you say "get more experience"? I am applying to fellowships but haven't been accepted to any yet. I don't want to do more ML work that doesn't focus on AI alignment if I can help it. I was considering writing some literature reviews. There are also some papers I would like to try replicating.

But if I'm being honest the things that feels most valuable to me is working on NDISP, OISs, and Maat, or finding other, similar enough projects and contributing to them. I guess I'm gambling with the time I have to focus on these things and I need to accept that if I'm deciding to focus on projects I think will be valuable but other people don't see the value in, then I'll have to keep focusing on them without financial or moral support, and accept the consequences for doing so.

[-]TristanTrim2mo*130

Question to people working on Technical AI Alignment: How are you currently making a living?

I want to, ideally, focus on Technical Alignment Research fulltime and feel a bit lost. Any advice or encouragement, or even discouragement, would be appreciated.

Edit 1: Here is johnswentworth's answer to this question in 2021. I don't know how much the space has changed since then.

Edit 2: Here is Connor Leahy's answer to this question in March 2026. (Present date at time of writing). His answer is pessimistic, but reflects my own understanding of the situation.

[-]TristanTrim2mo40

How long do you think something should be before it is no longer a quick take and should instead be a top level post? Or is it not about the length? Maybe it's about the amount of research and editing that goes into it?

[-]Brendan Long2mo54

I don't know the exact cutoff, but I think a decent number of quick takes should just be posts. We already have the Personal Blog tag and I just rely on the moderators deciding if something should be promoted to the frontpage.

[-]kave2mo40

(Periodic reminder that frontpaging is not about quality)

[-]Brendan Long2mo31

I think the question of whether something belongs in shortform vs. a post is separate from whether it's a good post. I was just assuming mods wouldn't promote something they think is too niche or undeveloped. The Personal Blogpost tag specifically calls out "niche topics" and "personal ramblings".

[-]TristanTrim2mo10

The "timeless" focus is cool to know about. It makes sense.

[-]TristanTrim2mo10

I think I agree. I'm imagining short takes fill the role of quick, twitter like back and forth of idea snippits, and general questions like the one I just posed, but it seems like you can write entire article length content in them which makes me wonder what different people think the distinction should be.

[-]habryka2mo43

Giving things a title is in my experience a very difficult process and changes the structure of an essay from a conversational open-ended style to a more "I know where I want the reader to go" situation. Both have their place, but they do feel quite different.

[-]TristanTrim2mo10

Oh, that's a really interesting answer, like the difference is the difference between a named and anonymous function. I do think there is a kind of important semiotic power in naming things, so I can understand wanting to avoid that, but I also feel pretty comfortable writing a post and then slapping a random name on it based on how the vibe turned out. This is what I just did with Ball+Gravity has a "Downhill" Preference. It started as a quick take, but then became a long take, so I copied and pasted it into a post and gave it a name. That's also what inspired this question.

[-]TristanTrim13d31

We need better de-politicizing technology. The politicization of issues seems very difficult to avoid. Once something becomes politicized, can it be depoliticized? I think we need jargon and communication norms and platforms that helps push back on the politicization of everything.

Relatedly, the notion that politicization harms peoples ability to form accurate believes and coordinate with one another seems tacitly true to me, but I can't easily find citations that straightforwardly support it, instead, it seems like most research takes it as assumed and then researches specific facets and details. Do you know of any good citations?

[-]Karl Krueger13d31

Can you expand on what you mean by "politicized" here? Here are some things I think I've heard it used to mean —

"Aligned with a partisan axis"; e.g. coal vs. wind power are politicized in that politicians of one party have favored building more coal power plants and opposed building windmills, and the other party have done the opposite.
"Treated as activism or signaling"; e.g. holiday greetings are politicized when someone gets into an argument with a barista who says "Happy Holidays", because they believe in the War on Christmas.
"Used as an example in political argument"; e.g. some will complain that a school shooting has been politicized when people use it as an example in arguments about gun control or self-defense.

[-]TristanTrim13d10

I think I should have used the word "polarizing" instead of "politicizing".

I mean the first two also with the implication that people treat these things as quasi-conflicts between quasi-tribes, and so become less likely to focus on what is correct and beneficial and more likely to focus on signalling tribal membership and allegiance.

I think your third bullet point is related, but not necessarily what I'm talking about. Arguing about how society should respond to and think about school shootings is important. School shootings are bad and should be prevented just like traffic accidents and heart disease are bad and should be prevented. I believe responses like gun control are politicized in that people are likely to pattern match "gun control" into a quasi-tribe conflict and then respond accordingly, instead of actually thinking about it, or as should often be done, ignoring if they are not well versed on the relevant issues. But just talking about issues and which parties plan what responses to those issues isn't necessarily a problem, except if it causes people to start contextualizing the issue as a quasi-tribal conflict.

Maybe instead of "politicized" or "polarized" a term like "quasi-tribalized" or "in-group-out-group-conflictized", or something similar but less rhetorically unwieldily.

[-]TristanTrim2mo30

I think there's a perverse incentive around the learning and usage of math. This is majorly coloured by my experiences in calculus class where it seemed like most students were interested in memorizing how to use the formulas to get the correct answer to get good grades, without necessarily understanding anything about what the calculations they were doing represented or were used for.

Maybe the problem here is fully from Goodharting on grades, but I wonder if there may be a broader phenomenon.

There's three objects here:

Understanding the symbolic language of some math
Understanding the point of that math: what does it relate to? What can it help you do? Where does it fit in the broader research community?
Social signalling of values and competence

The dynamic I'm hypothesizing is:

In general, (2) is more important than (1), and (1) is held in higher esteem than (2). Ideally people have a strong grasp of both (1) and (2), but people are intrinsically and extrinsically motivated to seek (3), and (2) is easier to fake than (1), so people are motivated to fake (2) and put their effort into signalling competence in (1) which would, ideally imply competence in (2), but doesn't necessarily.

I feel this relates to what Grant Sanderson of 3b1b talked about in Math's pedagogical curse. But of course I also worry this is something I'm imagining because I feel motivated to try to understand math that is too difficult for my level of skill and I want to rationalize away my incompetencies. Is it imposter syndrome or honest self knowledge?

What do you think? Is (2) more important than (1), or maybe it's not important to be skilled at both, and we need people better at (1) and people better at (2)? Do you agree the dynamic I described exists, and does it feel common, or marginal? Or maybe my entire framing is flawed in some way?

[-]XelaP1mo42

Ever read Surely You're Joking, Mr. Feynmann? Plenty of the stories in there involve someone (or an entire student body) not really understanding extremely basic things about what the hell they were talking about, despite having memorized some formulas and being adept at manipulating them.

Examples include:

Students who can recite the formula for Brewster's angle but don't realize what that would have to do with polaroids and light reflecting off of water,
Stedents being able to calculate the displacement of a ray of light shined through glass but thinking that if a book is rotated underneath a glass table that the image would rotate by twice the angle
Students that can recite the definition of diamagnetism but that can't actually name a single diamagnetic material.
A bunch of engineering students who clearly didn't understand that the derivative of a curve is the slope of the tangent line.
An assistant to Einstein that could not answer a simple question (whose result should've been well known to him) about length minimizing geodesics when phrased in terms of a rocket path that takes the least time to go up and back down again (aka the meaning of (a special case of) 'length minimizing geodesic')
a man adept at calculating cube roots with an abacus that had to use the device to confirm that 2^3 = 8.

To paraphrase from memory: Most people's knowledge is so fragile!

[-]TristanTrim1mo10

Yeah, knowledge, especially specialized knowledge, seems tragically fragile. I did read the book many years ago and recall enjoying it. Do you think in those cases:

It is a problem that the specialists' specialized knowledge is fragile?
The knowledge fragility is caused by some perverse incentive to focus on the technical rather than practical?
The fragility of the specialized knowledge could have been avoided if the perverse incentive issue was fixed?

[-]XelaP1mo20

The problem is not specialized knowledge. All of these examples are basic and fundamental to the subject. Except for the geodesic example^[1], I am confident that the other cases are cases of never actually learning what the hell they were talking about, and instead of focusing on that fundamental gap of understanding they instead focused on memorization and rote algebra.

Partly this is perverse incentive of current schools. But I suspect that even if schools didn't actively encourage these problems that you'd still see this, that you need some way to actively fight back. Unfortunately I'm not sure what the key is, here.

Some guesses:

If you actually, like, try to understand what the hell the things you're doing mean, maybe you'll succeed? This is good advice even if Feynmann's examples are easy for you: often relatively more complicated formulas have some intuitive meaning, even if it's not as standard to explain.
Maybe, try always coming up with simpler and more concrete examples for everything? This is similarly good advice, but also despite not doing enough of this for a while I wouldn't have nearly so bad a problem as Feynmann describes, which suggests this is less load bearing for me. Possibly, it's compensated by me understanding the abstract thing more by itself than most?
Maybe, try actually applying or operationalizing the knowledge? If on the first day that you learn about "polarization" you buy a polarizing filter and then just go look at a bunch of stuff (surfaces that shine at a grazing angle, phone screens, a couple other polarizers, an oil film, plastic objects, the sky, a rainbow, a laser, bright lights) you'll learn a bunch (e.g. brewster's angle, liquid crystals, malus's law, thin film interference, photoelasticity and stress polarimetry, rayleigh sky model and the vikings, more direct demonstration of dipole radiation in the air being polarized at near 90°, the polarizer being worse at higher frequency). Again despite not doing this for a while I ended up as me, even if doing so was fun and taught me extra bonus things.
Try... caring? Someone who cares may care to understand and to operationalize and to me concrete and to answer the questions they've always had about X.

^{^}
I myself didn't immediately know the answer to that one. However, it had been a while since I was looking into general relativity, and I had not gone that deeply, so I forgot the fact that the (spacetime interval version of) arc length of an observer's path is the proper time of the observer (just because the observer's not moving in their frame) and had to refigure it out (though by then I had read Feynmann's answer).

I anticipate that the assistant would've been able to tell you that fact without connecting it to rocket flights; and I anticipate that I would've, and even that if I had taken the time to write down the problem that I would've had a good chance of figuring out the necessary fact. Why am I like this, when the assistant wasn't?

[-]TristanTrim1mo10

I think it might be about specialized knowledge, because I see indexing and cross linking things as it's own kind of specialized knowledge. It seems like all of your examples are focused on creating bigger denser networks of cross domain indexing. I think that is great, and I love trying to do it myself! It's highly useful. But is it possible to be useful without it? I can hypothesize that a few people on a team with that skill could make other people that don't have that skill useful...

For example, maybe my co-worker might not understand what Brewster's angle actually means, but if I can rely on them to do the calculations correctly than I can use them to get more work done than I could do alone (hypothetically). If situations like that exist, or are common, then it is actually ok that most students (and employees) are not that interested in actually understanding what the symbolic manipulation they are doing means.

But there are two potential flaws. (1) We are creating more and more capable artificial general intelligence, and doing so may make the people who didn't understand what they were doing deeply enough no longer useful. This is bad under the current prevailing social systems. (2) It might not be the case that schools are actually teaching students anything that is actually useful if the students do not understand it deeply with cross indexing.

(1) is a more general problem... and honestly I'm more worried about misaligned ASI then economic impact, but it is still a pretty important concern.

(2) Is definitely true in some regards. It seems like education does function as a shit test to sort people into social strata, but insofar as it is actually teaching skills that get used, it may be better if emphasis was shifted away from "practice applying specialized skill" and towards "learn dense indexes connecting specialized skills to their applications", trusting that people can look up and reference the specialized skill if they need to apply it, but are much more likely to benefit from knowing which skills exist and where they are useful than to have a bunch of skills that they will forget because they don't understand how those skills connected back to anything real at all.

But I must confess I think (2) is happening somewhat implicitly through the way different communities of different specialized knowledge produce specialists that connect into cross domain teams. I think it would benefit from being made more explicit, but that is probably the sort of thing that sociologists and business management students learn about... I would like to learn more about sociology.

[-]Viliam2mo31

Understanding the symbolic language of math is difficult to master, and easy to evaluate (for someone who has the same skill). That makes it a convenient status marker.

[-]TristanTrim2mo10

Yeah. I think because of (3) there might be a perverse incentive to seek (1) at the detriment of (2). What do you think?

[-]Viliam1mo31

Yes.

Also, if you learn the symbolic language of math, you join the ranks of people fluent in math.

If you speculate about a purpose, you join a more diffuse set, which includes some deep thinkers but also many crackpots.

So it's like the wisest people are in the latter group, but the people in the former group are smarter on average.

[-]TristanTrim1mo30

Yeah... it's one of those tricky bayesian updates with a rare phenomenon.

P( wise | speculate about purpose ) = low
P( speculate about purpose | wise ) = high
P( crackpot | speculate about purpose ) = high
P( crackpot | learned symbolic language of math ) = low

It would be really great if there were easier, cheaper, and more accurate tests to distinguish crackpots from wise people. Or just better methods of dissuading people from becoming crackpots. Then focusing on purpose could signal wisdom without also, more strongly, signalling crackpot.

[-]TristanTrim2mo2-1

A contradiction that isn't a contradiction:

I hold both of these views:

More money should be directed to improving AI policy than is going into technical AI alignment.
I want to get paid to work on technical AI alignment.

Why might this seem like a contradiction? Either I should think that more money should be put into technical AI alignment so I and other people can get paid to do it, or I should conclude that that AI policy is more important and try to work on that instead.

Why do I believe this is not actually a contradiction? In my worldview, AI is a very important and potentially existentially dangerous technology, and the current shepherds of it's development are not handling their responsibility with commensurate wisdom. AI policy is then, the more important and constrained focus. But I do not believe that technical AI alignment should be completely forgotten, and I do not believe I have the aptitude or desire to do policy work. I intrinsically like research and theory building.

I wonder how many other people feel the way I feel. It would be quite a problem if it was the majority of us.

[-]TristanTrim2mo20

Given the existence of a hill and gravity, the roundness of a ball encodes it's preference for being at the bottom of the hill. Without the hill and gravity, the roundness of the ball could mean many things.

Given the existence of a ball and gravity. the bottom of a hill encodes a preference for where the ball should be.

Neither the ball nor the hill are modelling the world, yet they steer reality towards repeatable outcomes. Is it wrong to call this a preference even though it does not depend on world modelling?

Is it right to say preferences in this context only exist as the interplay of multiple parts of a system? Are the preferences of world modelling agents fully contained in those agents, or, like the ball and the hill, do the agents preferences exist as an interplay between the world and itself?

[-]CstineSublime2mo20

Not wrong, if used metaphorically, but I think that preference which implies a agent that is aware of and capable of making choices maybe muddies whatever you're trying to express. In the case of the ball and the hill, that is not the case. Preference, in ordinary parlance, suggests the first of options. Often options are qualitative: "I prefer Chocolate Ice-Cream to Strawberry". In Economics it's about "optimal choice" which - again - do the hill and the ball have the capacity to take alternatives? Is there some utility they are maximizing?

Spinoza says that if a stone which has been projected through the air, had consciousness, it would believe that it was moving of its own free will. I add this only, that the stone would be right. The impulse given it is for the stone what the motive is for me, and what in the case of the stone appears as cohesion, gravitation, rigidity, is in its inner nature the same as that which I recognise in myself as will, and what the stone also, if knowledge were given to it, would recognise as will. -
Arthur Schopenhauer

If your objective is to describe the most probable or likely outcome of a system that is better modeled using Daniel Dennett's Physical Stance than his Intentional Stance, then I'd avoid using "preference". In the example you've given, there's nothing to suggest the ball will be anywhere else, there's nothing to suggest it has "options" therefore there are no preferences to speak of.

Preference implies alternative outcomes.

[-]TristanTrim2mo10

Thanks for engaging : )

I think the phrase "aware of and capable of making choices" hides most of the complexity I am interested in focusing on. What really is awareness? The word "aware" implies that it is a boolean thing, like "either some system is aware or it is not", but I think that's wrong. I think "awareness" varies in amount and kind.

And "making choices" is similarly complicated. The ball could stay put or roll, but it chooses to roll. You could say it never had the choice to do anything but roll because the mechanism which determined its choice to roll, its roundness, is so obvious and exposed, but suppose I understood the mechanisms of some human's mind well enough to predict that human's actions with the same accuracy? Would it be right to suggest that humans do not make choices since the choices were determined by the mechanisms by which humans choose?

It seems to me the Physical Stance and the Intentional Stance both describe the same systems. It is my feeling that in order to understand complex decision making systems, such as humans, AI, and sociotechnical systems, we need to have language that can describe them clearly. So I guess what I might be doing here is trying to force an exploration of the boundary between where the physical and intentional stance apply.

I could believe that a symbolic representation of other objects is the quality required to say that a system is aware, but then, is roundness symbolic? Where is the distinction between symbolic and mechanical?

Likewise, I could very much imagine alternative outcomes are required for preference, but then either some system having preferences depends on how well understood it is. That has uncomfortable implications. If an ASI understood humans sufficiently well, would that ASI be justified in claiming that humans do not have preferences? I'm much more comfortable admitting any system that affects outcomes has preferences than denying the preferences of any sufficiently well understood system.

... Oh, also, I didn't put as much emphasis on it but I really am interested in the question of whether an agent's preferences exist as an interplay between the world and itself. I feel that would have important implications for Agent Foundations and AI Alignment.

[-]CstineSublime2mo10

but suppose I understood the mechanisms of some human's mind well enough to predict that human's actions with the same accuracy? Would it be right to suggest that humans do not make choices since the choices were determined by the mechanisms by which humans choose?

I don't think we need to suppose... I'd guess you probably do frequently. You have family members, friends, and/or lovers, or people whom you have intimate knowledge and extremely good track records of predicting their behavior?

If an ASI understood humans sufficiently well, would that ASI be justified in claiming that humans do not have preferences? I'm much more comfortable admitting any system that affects outcomes has preferences than denying the preferences of any sufficiently well understood system.

I don't think it would be any more justified claiming that humans don't have preferences than I can claim that anybody I know really well doesn't have preferences. If you can predict which newspaper or soft-drink your Father buys from the store, that doesn't mean he had no choice in the matter. If there's no other newspapers in stock, or only one brand of soft-drink - then he has no choice. But, realistically, you can't choose alternatives you're not aware of.

A simple test of whether something is not a choice or not is to ask: "if the agent believed something else or had very different desires, would the outcome be very different?". If no matter what the agent desires or believes, the outcome would always be the same. Then that's not a choice.

If someone goes up to the fridge at a store and there's a orange drink, and a strawberry drink. And you know they love Orange flavor, and so they buy the orange. that's still a choice. But - imagine you knew they HATED Orange, or if they loved Strawberry instead - hypothetically they would then choose the Strawberry. Therefore it was a choice.

Conversely, imagine a spectator high up on an embankment at a motorrace. They are in a sea of people, a mere spec as seen from the track, so they have no earthly way of affecting the result of the motorrace. There's twenty racers. It doesn't matter who this single spectator desires or wishes to win - the result is hypothetically always the same. This is not a choice.

I am not familiar of any credible model were a ball can "desire" to go up, and contingent on that alone, it does. This is why it is best represented by the "physical" stance in Dennet's typology.

The word "aware" implies that it is a boolean thing, like "either some system is aware or it is not", but I think that's wrong. I think "awareness" varies in amount and kind.

Abstractly, I agree with this, and I think there's a spectrum of awareness in ways that do influence choices. But I'm struggling for examples right now... the best that comes to mind is when a couple are deciding where to go to dinner, and one of them says "let's have Italian" knowing there is an Italian restaurant, they aren't strictly aware of the menu, it could include Ragù, Calzone, Osso Buco or dozens of others choices - but they are aware of at least one restaurant nearby, in their price-range, that does "Italian".

Likewise preferences themselves often exist in parallel. If Orange isn't available, maybe they go for Banana, or Cherry. And likewise choices made are often prompted by complex decision making models that are operating on dozens of different dimensions or factors, even something as simple as buying a shirt - is it comfortable? do I like the pattern or the colour? is the material breathable? What are the washing instructions? etc. etc. etc.

[-]TristanTrim2mo10

A lot of this is black box analysis. I'm interested in white box analysis. I guess maybe "black box vs white box" means the same thing as "intentional stance vs physical stance".

You speak of knowing the preferences of something, with the implication that you have observed the past behaviour of the system and can infer it's future behaviour based on an abstract model of it's "intentions" or "preferences". Is this what is meant by the "intentional stance"? I think so, and it is indeed a valid way to examine the world.

But within a person, and within an AI model, there is some mechanism that causes those preferences to be so... and that is the kind of understanding I am focusing on. Predicting choice to get Orange flavour, not based on past behaviour involving flavour choices or hearing statements about preferences, but by examining the body and brain and brainstate with enough skill to see how and where the preference for orange is encoded, and predicting based on that. Is this the "physical stance"? In that case I think I might be interested in merging the physical and intentional stance.

For example, I might know that balls roll down hills not because I have analyzed them as physical objects, but because I have observed them roll down hills before. Is this not the same as the intentional stance? Modelling the preferences of the ball based on it's past behaviour?

On the other hand, it isn't too difficult to understand how roundness vs flatness affect rolling. The flat object stays where it is put and the round object rolls down the hill. You can see mechanically why this is the case, but you could also just as well know this by inference and I would suggest that most people learn about physical laws first by observing the behaviours of objects and only later in life learning about things like friction and force and gravity.

I haven't noticed anything you have said that categorically distinguishes the behaviour of an object rolling down a hill from the behaviour of a person expressing their preferences by choosing what they want.

[-]TristanTrim13d10

In worlds where leaders X of some social movement condemn taboo action Y you would expect to see X condemning Y. But also in worlds where X supports Y, you would expect them to not be able to publicly state that they support Y, and so in these worlds too, you would expect X to condemn Y. But this creates a problem. If condemning Y is what X does in both worlds where they support and condemn Y, how can X actually communicate with supporters who find themselves wondering if they can support the movement with Y. Do sufficiently strong taboos against Y make it impossible to actually communicate true condemnation clearly? That would suck.

[-]Viliam1h20

False condemnation: "Yes, of course Y is a bad thing, although it's quite understandable how people may feel driven to do it."

[-]TristanTrim2mo10

I don't know who needs to hear this, but Funkot and Phonk are both amazing genres of music.

[-]TristanTrim2mo*10

Thought while doing Transformers from Scratch:

Inside a transformer block, the MLP embeds into a higher dimensional space, applies an activation function, and then projects back into a lower dimensional space. From a semantic distribution perspective, here are three intuitions for why this makes sense:

The more dimensions you have the more complicated "knots" you can untangle. With 2d, you can pull the centre out of a line. With 3d, you can pull the centre out of a disk. With 4d, you can pull the centre out of a ball. Etc...
Each dimension in the activation space is like an independent fold applied to the semantic space. The more folds you have the more you can transform the semantics.
The embedding space can be thought of as partitions of independent copies of the input space, each transformed independently. The higher the dimension of the embedding space, the more copies of larger subspaces of the input (including the entire input space) can be independently transformed.

Moderation Log

More from TristanTrim

Curated and popular this week

38Comments

38 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:02 AM

[-]TristanTrim1mo152

I find trying to find funding or paid roles or even unpaid roles so demoralizing. How do I keep motivated?

I want advice. How do I get funding? How do I think about getting funding? How do I stay motivated to keep thinking about how to get funding?

[-]Eli Tyre1mo60

What work are you doing? Is any of it publicly viewable?

[-]Eli Tyre25d30

And a different question: How young are you? Are there experienced people who have worked with you and can vouch for the quality of your work / strategic orientation / etc?

[-]TristanTrim24d*10

[-]TristanTrim25d10

Here's the page describing my projects.

[-]Thomas Kwa25d21

[-]TristanTrim25d10

[-]TristanTrim2mo*130

Question to people working on Technical AI Alignment: How are you currently making a living?

I want to, ideally, focus on Technical Alignment Research fulltime and feel a bit lost. Any advice or encouragement, or even discouragement, would be appreciated.

Edit 1: Here is johnswentworth's answer to this question in 2021. I don't know how much the space has changed since then.

Edit 2: Here is Connor Leahy's answer to this question in March 2026. (Present date at time of writing). His answer is pessimistic, but reflects my own understanding of the situation.

[-]TristanTrim2mo40

[-]Brendan Long2mo54

[-]kave2mo40

(Periodic reminder that frontpaging is not about quality)

[-]Brendan Long2mo31

[-]TristanTrim2mo10

The "timeless" focus is cool to know about. It makes sense.

[-]TristanTrim2mo10

[-]habryka2mo43

[-]TristanTrim2mo10

[-]TristanTrim13d31

[-]Karl Krueger13d31

Can you expand on what you mean by "politicized" here? Here are some things I think I've heard it used to mean —

"Aligned with a partisan axis"; e.g. coal vs. wind power are politicized in that politicians of one party have favored building more coal power plants and opposed building windmills, and the other party have done the opposite.
"Treated as activism or signaling"; e.g. holiday greetings are politicized when someone gets into an argument with a barista who says "Happy Holidays", because they believe in the War on Christmas.
"Used as an example in political argument"; e.g. some will complain that a school shooting has been politicized when people use it as an example in arguments about gun control or self-defense.

[-]TristanTrim13d10

I think I should have used the word "polarizing" instead of "politicizing".

Maybe instead of "politicized" or "polarized" a term like "quasi-tribalized" or "in-group-out-group-conflictized", or something similar but less rhetorically unwieldily.

[-]TristanTrim2mo30

Maybe the problem here is fully from Goodharting on grades, but I wonder if there may be a broader phenomenon.

There's three objects here:

Understanding the symbolic language of some math
Understanding the point of that math: what does it relate to? What can it help you do? Where does it fit in the broader research community?
Social signalling of values and competence

The dynamic I'm hypothesizing is:

In general, (2) is more important than (1), and (1) is held in higher esteem than (2). Ideally people have a strong grasp of both (1) and (2), but people are intrinsically and extrinsically motivated to seek (3), and (2) is easier to fake than (1), so people are motivated to fake (2) and put their effort into signalling competence in (1) which would, ideally imply competence in (2), but doesn't necessarily.

[-]XelaP1mo42

Examples include:

Students who can recite the formula for Brewster's angle but don't realize what that would have to do with polaroids and light reflecting off of water,
Stedents being able to calculate the displacement of a ray of light shined through glass but thinking that if a book is rotated underneath a glass table that the image would rotate by twice the angle
Students that can recite the definition of diamagnetism but that can't actually name a single diamagnetic material.
A bunch of engineering students who clearly didn't understand that the derivative of a curve is the slope of the tangent line.
An assistant to Einstein that could not answer a simple question (whose result should've been well known to him) about length minimizing geodesics when phrased in terms of a rocket path that takes the least time to go up and back down again (aka the meaning of (a special case of) 'length minimizing geodesic')
a man adept at calculating cube roots with an abacus that had to use the device to confirm that 2^3 = 8.

To paraphrase from memory: Most people's knowledge is so fragile!

[-]TristanTrim1mo10

Yeah, knowledge, especially specialized knowledge, seems tragically fragile. I did read the book many years ago and recall enjoying it. Do you think in those cases:

It is a problem that the specialists' specialized knowledge is fragile?
The knowledge fragility is caused by some perverse incentive to focus on the technical rather than practical?
The fragility of the specialized knowledge could have been avoided if the perverse incentive issue was fixed?

[-]XelaP1mo20

Some guesses:

If you actually, like, try to understand what the hell the things you're doing mean, maybe you'll succeed? This is good advice even if Feynmann's examples are easy for you: often relatively more complicated formulas have some intuitive meaning, even if it's not as standard to explain.
Maybe, try always coming up with simpler and more concrete examples for everything? This is similarly good advice, but also despite not doing enough of this for a while I wouldn't have nearly so bad a problem as Feynmann describes, which suggests this is less load bearing for me. Possibly, it's compensated by me understanding the abstract thing more by itself than most?
Maybe, try actually applying or operationalizing the knowledge? If on the first day that you learn about "polarization" you buy a polarizing filter and then just go look at a bunch of stuff (surfaces that shine at a grazing angle, phone screens, a couple other polarizers, an oil film, plastic objects, the sky, a rainbow, a laser, bright lights) you'll learn a bunch (e.g. brewster's angle, liquid crystals, malus's law, thin film interference, photoelasticity and stress polarimetry, rayleigh sky model and the vikings, more direct demonstration of dipole radiation in the air being polarized at near 90°, the polarizer being worse at higher frequency). Again despite not doing this for a while I ended up as me, even if doing so was fun and taught me extra bonus things.
Try... caring? Someone who cares may care to understand and to operationalize and to me concrete and to answer the questions they've always had about X.

^{^}
I myself didn't immediately know the answer to that one. However, it had been a while since I was looking into general relativity, and I had not gone that deeply, so I forgot the fact that the (spacetime interval version of) arc length of an observer's path is the proper time of the observer (just because the observer's not moving in their frame) and had to refigure it out (though by then I had read Feynmann's answer).

I anticipate that the assistant would've been able to tell you that fact without connecting it to rocket flights; and I anticipate that I would've, and even that if I had taken the time to write down the problem that I would've had a good chance of figuring out the necessary fact. Why am I like this, when the assistant wasn't?

[-]TristanTrim1mo10

(1) is a more general problem... and honestly I'm more worried about misaligned ASI then economic impact, but it is still a pretty important concern.

[-]Viliam2mo31

Understanding the symbolic language of math is difficult to master, and easy to evaluate (for someone who has the same skill). That makes it a convenient status marker.

[-]TristanTrim2mo10

Yeah. I think because of (3) there might be a perverse incentive to seek (1) at the detriment of (2). What do you think?

[-]Viliam1mo31

Yes.

Also, if you learn the symbolic language of math, you join the ranks of people fluent in math.

If you speculate about a purpose, you join a more diffuse set, which includes some deep thinkers but also many crackpots.

So it's like the wisest people are in the latter group, but the people in the former group are smarter on average.

[-]TristanTrim1mo30

Yeah... it's one of those tricky bayesian updates with a rare phenomenon.

P( wise | speculate about purpose ) = low
P( speculate about purpose | wise ) = high
P( crackpot | speculate about purpose ) = high
P( crackpot | learned symbolic language of math ) = low

[-]TristanTrim2mo2-1

A contradiction that isn't a contradiction:

I hold both of these views:

More money should be directed to improving AI policy than is going into technical AI alignment.
I want to get paid to work on technical AI alignment.

I wonder how many other people feel the way I feel. It would be quite a problem if it was the majority of us.

[-]TristanTrim2mo20

Given the existence of a ball and gravity. the bottom of a hill encodes a preference for where the ball should be.

Neither the ball nor the hill are modelling the world, yet they steer reality towards repeatable outcomes. Is it wrong to call this a preference even though it does not depend on world modelling?

[-]CstineSublime2mo20

Spinoza says that if a stone which has been projected through the air, had consciousness, it would believe that it was moving of its own free will. I add this only, that the stone would be right. The impulse given it is for the stone what the motive is for me, and what in the case of the stone appears as cohesion, gravitation, rigidity, is in its inner nature the same as that which I recognise in myself as will, and what the stone also, if knowledge were given to it, would recognise as will. -
Arthur Schopenhauer

[-]TristanTrim2mo10

Thanks for engaging : )

[-]CstineSublime2mo10

but suppose I understood the mechanisms of some human's mind well enough to predict that human's actions with the same accuracy? Would it be right to suggest that humans do not make choices since the choices were determined by the mechanisms by which humans choose?

If an ASI understood humans sufficiently well, would that ASI be justified in claiming that humans do not have preferences? I'm much more comfortable admitting any system that affects outcomes has preferences than denying the preferences of any sufficiently well understood system.

The word "aware" implies that it is a boolean thing, like "either some system is aware or it is not", but I think that's wrong. I think "awareness" varies in amount and kind.

[-]TristanTrim2mo10

A lot of this is black box analysis. I'm interested in white box analysis. I guess maybe "black box vs white box" means the same thing as "intentional stance vs physical stance".

[-]TristanTrim13d10

[-]Viliam1h20

False condemnation: "Yes, of course Y is a bad thing, although it's quite understandable how people may feel driven to do it."

[-]TristanTrim2mo10

I don't know who needs to hear this, but Funkot and Phonk are both amazing genres of music.

[-]TristanTrim2mo*10

Thought while doing Transformers from Scratch:

The more dimensions you have the more complicated "knots" you can untangle. With 2d, you can pull the centre out of a line. With 3d, you can pull the centre out of a disk. With 4d, you can pull the centre out of a ball. Etc...
Each dimension in the activation space is like an independent fold applied to the semantic space. The more folds you have the more you can transform the semantics.
The embedding space can be thought of as partitions of independent copies of the input space, each transformed independently. The higher the dimension of the embedding space, the more copies of larger subspaces of the input (including the entire input space) can be independently transformed.

Moderation Log