Nina Panickssery's Shortform — LessWrong

Nina Panickssery's Shortform

7th Jan 2025

1 min read

7 Ω 4

This is a special post for quick takes by Nina Panickssery. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Nina Panickssery's Shortform

4the gears to ascension

11the gears to ascension

1Sheikh Abdur Raheem Ali

2the gears to ascension

76 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:03 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Nina Panickssery7mo672

Giving unsolicited advice and criticism is a very good credible signal of respect

I have often heard it claimed that giving advice is a bad idea because most people don't take it well and won't actually learn from it.

Giving unsolicited advice/criticism risks:

The recipient liking you less
The recipient thinking you are stupid because "obviously they have heard this advice before"
The recipient thinking you are stupid because they disagree with the advice
The recipient being needlessly offended without any benefit

People benefit from others liking them and not thinking they are stupid, so these are real costs. Some people also don't like offending others.

So clearly it's only worth giving someone advice or criticism if you think at least some of the following are true:

Their wellbeing/impact/improvement is important enough that the small chance your advice has a positive impact is worth the cost
They are rational enough to not take offense in a way that would damage your relationship
They are particularly good at using advice/criticism, i.e. they are more likely to update than the average person
They value honest opinions and feedback even when they disagree, i.e. they prefer to know what othe

... (read more)

[-]CstineSublime7mo1116

No doubt that sycophancy and the fear of expressing potentially friendship damaging truths allows negative patterns of behavior to continue unimpeded but I think you've missed the two most necessary factors in determining if advice - solicited or unsolicited - is a net benefit to the recipient:

1. you sufficiently understand and have the expertise to comment on their situation
&
2. you can offer new understanding they aren't already privy to.

Perhaps the situations where I envision advice is being given is different to yours?

The problem I notice with most unsolicited advice is it's either something the recipient is already aware of (i.e. the classic sitcom example is someone touches a hot dish and after the fact is told "careful that pan is hot" - is it good advice? Well in the sense that it is truthful, maybe. But the burn already having happened, it is not longer useful.) This is why it annoys people, this is why it is taken as an insult to their intelligence.

A lot of people have already heard the generic or obvious advice and there may be many reasons why they aren't following it,^[1] and most of the time hearing this generic advice being repeated will not be of a benefit ev... (read more)

4Nina Panickssery7mo

1. Perhaps a situation to avoid giving advice in is if you think your advice is likely to be genuinely worthless because you have no expertise, knowledge, or intelligence that is relevant to the matter and you don't trust your own judgment at all. Otherwise if you respect the other person, you'd consider them able to judge the usefulness of your advice for themselves. 2. You can't know for sure that they've heard some advice before. Also you are providing the information that the piece of advice occurred to you, which in and of itself is often interesting/useful. So if you're giving someone advice they are likely to have heard before this means there is a small chance that's wrong and it's still useful, and a larger chance that it has value zero. So in expectation the value is still positive. If you don't give the advice, you are prioritizing not looking stupid or not offending them, which are both selfish motives. Related to (2) is that telling someone you disapprove or think less of them for something, i.e. criticizing without providing any advice at all, is also a good signal of respect, because you are providing them with possibly useful information at the risk of them liking you less or making you feel uncomfortable.

8Neel Nanda7mo

In my opinion, this misses the crucial dynamic that the costs of giving advice significantly go up if you care about what the other person thinks of you, which is correlated with respect, status and power. I personally think that giving advice is good, that if given tactfully many people take it well, and also often enjoy giving it, so will generally try to do this wherever possible unless there's a clear reason not to, especially in the context of EG interpretability research. But I'm much more cautious if I'm talking to someone who seems important, consider themselves high status, has power over me, etc. I think this is a large part of why people can feel offended by receiving advice. There can be some implicit sense of "you are too stupid to have thought of this", especially if the advice is bad or obvious. Another important facet is that most people are not (competent) utilitarians about social interactions, so you cannot accurately infer their beliefs with reasoning like this.

4Nina Panickssery7mo

Fair, there’s a real tension between signaling that you think someone has a good mindset (a form of intellectual respect) and signaling that you are scared of someone’s power over you or that you care a lot about their opinion of you.

4Yi-Yang7mo

I noticed feeling a little unsatisfied and worried about this advice. I think it pattern matches with people who are savvy with status games or subtle bullying that allows for plausible deniability ("I'm just trying to help! You're being too sensitive."). I think people's heuristic of perceiving criticisms as threatening seems somewhat justified most of the time. To be clear, I tentatively define respect as the act of (a) evaluating a person as having an amount of value and welfare that is just as important as yours, (b) believing that this person's value and welfare is worth caring about, and (c) treating them as such. You don't have to admire or like a person to respect them. Here are some actions that connote disrespect (or indignity): torture, murder, confinement, physical abuse, verbal abuse, causing a person's social standing to drop unnecessarily, etc. Having said that, I'm still not satisfied with this definition, but it's the best I can come up so far. Maybe you've thought about this already or I've missed some implicit assumptions, but let me try to explain by first using Buck's experience as an example: I interpret this as Buck (a) being appreciative of a criticism that seems unreasonable and unfair, yet (b) his need for respect wasn't fulfilled--I would probably say "fuck that guy" too if they thought my opinions don't matter in any situation due to the color of my hair. I could imagine Buck's interlocutor passing your above conditions: 1. They might believe that Buck can be more impactful when other people see him with normal looking hair colour and takes him more seriously. 2. They might believe Buck is rational enough (but it turns out Buck was offended anyway). 3. They might believe Buck is good at using advice/criticism. 4. They might believe Buck values opinions and feedback even when they disagree (this is true). I could also imagine Buck's interlocutor doing a cost-benefit analysis and believing the associated costs you mentioned abov

4Vladimir_Nesov7mo

How people respond tells you something about them, so you don't necessarily need to start with a clear picture of how they might respond. Also, I think advice is the wrong framing for things that are useful to give, it's better to make sure people have the knowledge and skills to figure out the things they seem to need to figure out. Similarly to the "show, don't tell" of educational discussions, you want to present the arguments and not the conclusions, let alone explicitly insist that the other person is wrong about the conclusions. Or better yet, promote the skills that let them assemble the arguments on their own, without needing to concretely present the arguments. It might help to give the arguments and even conclusions or advice eventually, after everything else is done, but it's not the essential part and might be pointless or needlessly confrontational if the conclusions they arrive at happen to differ.

2Jiro7mo

Any rule about when to give advice has to be robust to people going on and on to lecture you about Jesus because they truly and sincerely want to keep you out of Hell. (Or lecture about veganism, or EA, or politics.) More generally, social rules about good manners have to apply to everyone--both to people with correct beliefs and to people with incorrect ones. Just like not letting the police break into everyone's houses catches fewer criminals (when the police are right), but protects innocent people (when the police are wrong), not giving advice helps fewer people (when the advice giver is right), but saves people from arrogant know it alls and meme plagues (when the advice giver is wrong).

1eigenblake7mo

I think this discussion about advice is very fruitful. I think the existing comments do a great job of characterizing why someone might reasonably be offended. So if we take that as the given situation: you want to help people, project respect, but don't want it to come off the wrong way, what could you do? My partial answer to this, is merely sharing your own authentic experience of why you are personally persuaded by the content of the advice, and allowing them to internalize that evidence and derive inferences for themselves. At social gatherings, the people in my life do this- just sharing stories, sometimes horror stories where the point is so obvious that it doesn't need explicit statement. And it feels like a genuine form of social currency to faithfully report on your experiences. This reminds me of "Replace the Symbol with the Substance" [1] where the advice is the symbol and the experience is the substance. So I wonder if that's part of it - creating the same change in the person anyway a the while mitigating the risk of condescension. The dynamics of the relationship also complicate analyzing the situation. And in what type of social setting the advice is delivered. And probably a bunch more factors I haven't thought of yet. [1]: https://www.lesswrong.com/posts/GKfPL6LQFgB49FEnv/replace-the-symbol-with-the-substance

1Buddy Williams7mo

Insightful. Glad you wrote it. I enjoyed the combination of "these are real costs" and "positive impact is worth the cost." I found this insightful, "...reflect a superior attitude...give...advice or criticism...signaling...they have...these positive traits" I think the challenge lies in categorizing people as "superior" and "average". I like the use of labels since it helps the conversation, but I wonder if it is too limiting. Perhaps, context and topic are important dimensions worthy of consideration as well. I can imagine real people responding differently given more variables, such as context and topic. Bottom line: I loved it!

[-]Nina Panickssery6mo358

On people's arguments against embryo selection

A recent NYT article about Orchid's embryo selection program triggered a surprising to me backlash on X where people expressed disgust and moral disapproval at the idea of embryo selection. The arguments generally fell into two categories:

(1) "The murder argument" Embryo selection is bad because it involves creating and then discarding embryos, which is like murdering whole humans. This argument also implies regular IVF, without selection, is also bad. Most proponents of this argument believe that the point of fertilization marks a key point when the entity starts to have moral value, i.e. they don't ascribe the same value to sperm and eggs.

(2) "The egalitarian argument" Embryo selection is bad because the embryos are not granted the equal chance of being born they deserve. "Equal chance" here is probably not quite the correct phrase/is a bit of a strawman (because of course fitter embryos have a naturally higher chance of being born). Proponents of this argument believe that intervening on the natural probability of any particular embryo being born is anti-egalitarian and this is bad. By selecting for certain traits we are saying peopl... (read more)

5Hestia6mo

People like to have clear-cut moral heuristics like "killing is bad." This gives them an easy guide to making a morally correct decision and an easy guide to judging other's actions as moral or immoral. This requires simplifying multidimensional situations into easily legible scenarios where a binary decision can be made. Thus you see people equating embryo disposal to first-degree murder, and others advocating for third-trimester abortion rights.

4localdeity6mo

Regarding egalitarian-like arguments, I suspect many express opposition to embryo selection not because it’s a consequence of a positive philosophy that they state and believe and defend, but because they have a negative philosophy that tells them what positions are to be attacked. I suspect that if you put together the whole list of what they attack, there would be no coherent philosophy that justifies it (or perhaps there would be one, but they would not endorse it). There is more than zero logic to what is to be attacked and what isn’t, but it has more to do with “Can you successfully smear your opponent as an oppressor, or as one who supports doctrines that enable oppression; and therefore evil or, at best, ignorant if they immediately admit fault and repent; in other words, can you win this rhetorical fight?” than with “Does this argument, or its opposite, follow from common moral premises, data, and logical steps?”. In this case, it’s like, if you state that humans with blindness or whatever have less moral worth than fully healthy humans, then you are to be attacked; and at least in the minds of these people, selecting embryos of the one kind over the other is close enough that you are also to be attacked. (Confidence: 75%)

4[anonymous]6mo

Some people believe embryos have souls which may impact their moral judgement. Soul can be considered as "full human life" in moral terms. I think attributing this to purely potential human life may not be accurate, since the intuitions for essentialist notions of continuity of selfhood can be often fairly strong among certain people.

2TsviBT6mo

I appreciate the pursuit of non-strawman understandings of misgivings around reprogenetics, and the pursuit of addressing them. I don't feel I understand the people who talk about embryo selection as "killing embryos" or "choosing who lives and dies", but I want to and have tried, so I'll throw some thoughts into the mix. First: Maybe take a look at: https://www.thenewatlantis.com/publications/the-anti-theology-of-the-body Hart, IIUC, argues that wanting to choose who will live and who won't means you're evil and therefore shouldn't be making such choices. I think his argument is ultimately stupid, so maybe I still don't get it. But anyway, I think it's an importantly different sort of argument than the two you present. It's an indictment of the character of the choosers. Second: When I tried to empathize with "life/soul starts at conception", what I got was: * We want a simple boundary... * ... for political purposes, to prevent... * child sacrifice (which could make sense given the cults around the time of the birth of Christianity?). * killing mid-term fetuses, which might actually for real start to have souls. * ... for social purposes, because it causes damage to .... * the would-be parents's souls to abort the thing which they do, or should, think of as having a soul. * the social norm / consensus / coordination around not killing things that people do or should orient towards as though they have souls. * The pope said so. (...But then I'd like to understand why the pope said so, which would take more research.) (Something I said to a twitter-famous Catholic somehow caused him to seriously consider that, since Yermiahu says that god says "Before I formed you in the womb I knew you...", maybe it's ok to discard embryos before implantation...) * (My invented explanation:) Souls are transpersonal. They are a distributed computation between the child, the parents, the village, society at large, and humanity throughout all time (go

[-]Nina Panickssery7mo318

People talk about meditation/mindfulness practices making them more aware of physical sensations. In general, having "heightened awareness" is often associated with processing more raw sense data but in a simple way. I'd like to propose an alternative version of "heightened awareness" that results from consciously knowing more information. The idea is that the more you know, the more you notice. You spot more patterns, make more connections, see more detail and structure in the world.

Compare two guys walking through the forest: one is a classically "mindful" type, he is very aware of the smells and sounds and sensations, but the awareness is raw, it doesn't come with a great deal of conscious thought. The second is an expert in botany and birdwatching. Every plant and bird in the forest has interest and meaning to him. The forest smells help him predict what grows around the corner, the sounds connect to his mental map of birds' migratory routes.

Sometimes people imply that AI is making general knowledge obsolete, but they miss this angle—knowledge enables heightened conscious awareness of what is happening around you. The fact that you can look stuff up on Google, or ask an AI assistant, does not actually lodge that information in your brain in a way that lets you see richer structure in the world. Only actually knowing does that.

2Viliam7mo

Yeah, two people can read the same Wikipedia page, and get different levels of understanding. The same is true for reading the same AI output. No matter how nicely the AI puts it, either it connects with something in your brain or it doesn't. In theory, with a superhuman general AI, we could say something like "hey, AI, teach me enough to fully appreciate the thing that you just wrote" (with enough patience and a way to reduce hallucinations, we might be able to achieve a similar effect even with current AIs), but most people probably won't bother.

1Robbie7mo

Perhaps it's that those people say AI is making general knowledge obsolete because it reduces the social value or status of possessing general knowledge by making it an abundant resource. As you said though, the fact that people have access to that abundant resource doesn't mean they understand how to properly make use of it. The capability to understand is still a scarce resource.

[-]Nina Panickssery1mo214

The risk of incorrectly believing in moral realism

(Status: not fully fleshed out, philosophically unrigorous)

A common talking point is that if you have even some credence in moral realism being correct, you should act as if it's correct. The idea is something like: if moral realism is true and you act is if it's false, you're making a genuine mistake (i.e. by doing something bad), whereas if it's false and you act as if it's true, it doesn't matter (i.e. because nothing is good or bad in this case).

I think this way of thinking is flawed, and in fact, the opposite argument can be made (albeit less strongly): if there's some credence in moral realism being false, acting as if it's true could be very risky.

The "act as if moral realism is true if unsure" principle contrasts moral realism, (i.e. that there is an objective moral truth, independent of any particular mind) with nihilism (i.e. nothing matters). But these are not the only two perspectives you could have. Moral subjectivism is a to-me intuitively compelling anti-realist view, which says that the truth value of moral propositions is mind-dependent (i.e. based on an individual's beliefs about what is right and wrong).

From... (read more)

5Tobias H1mo

Generally agree, but disagree with this part: There’s room for persuasion and deliberation as well. Moral anti-realists can care about how other people form moral beliefs (e.g. quality of justifications, coherence of values, non-coercion).

1dr_s1mo

I think those things can be generally interpreted as "trades" in the broadest sense. Sometimes trades of favour, reputation, or knowledge.

3Vladimir_Nesov1mo

Moral anti-realism shouldn't insist that a person's values are apparent to that person, what they currently think is good. Moral realism likes to declare the dubious assumption that everyone's values-on-reflection should be the same (in the limit), but hardly uses this assumption. Instead, it correctly points out that values-on-reflection are not the same as currently-apparent-values, that arguments about values are worthwhile. But the same should be the case when we allow (normative) orthogonality, where everyone's values-on-reflection can (normatively) end up different. Worthwhile arguments can even be provided by one person to another, about that other's person misunderstanding of their own different values.

2Elehrer1mo

It's easy to conflate three different things: 1. Whether or not there is an objective collection of moral facts 2. Whether or not it is possible to learn objective moral facts 3. Whether or not I should convince someone to believe a certain set of moral facts in a given situation We can deny (1) with moral subjectivism. We can accept (1) but deny (2) by claiming that there are objective moral facts, but some (or all) of these are unknowable to some (or all) of humanity (moral realists don't need to think that they can prove anything to others) We can accept (1) and (2) but deny (3) by saying that persuasion isn’t always the action that maximizes moral outcomes. Maybe the way to achieve the morally best outcome is actually to convince someone else of some false ideas that end up leading to morally useful actions (e.g. in 1945 we could convince Hitler's colleagues that it's righteous in general to backstab your colleagues if it meant one of them might kill Hitler) So moral realists can accept that others will have other conceptions of good, and believe that the best options are to overpower or trade with those others (rather than convince them). They're not perfect examples, but we've seen many moral realists do this throughout history (e.g. the Crusades). I think whether or not convincing others of your sense of morality is a morality-maximizing action depends a lot on the specifics of your morality and the context you're in.

[-]Nina Panickssery9moΩ6172

I think people who predict significant AI progress and automation often underestimate how human domain experts will continue to be useful for oversight, auditing, accountability, keeping things robustly on track, and setting high-level strategy.

Having "humans in the loop" will be critical for ensuring alignment and robustness, and I think people will realize this, creating demand for skilled human experts who can supervise and direct AIs.

(I may be responding to a strawman here, but my impression is that many people talk as if in the future most cognitive/white-collar work will be automated and there'll be basically no demand for human domain experts in any technical field, for example.)

5Vladimir_Nesov9mo

Oversight, auditing, and accountability are jobs. Agriculture shows that 95% of jobs going away is not the problem. But AI might be better at the new jobs as well, without any window of opportunity where humans are initially doing them and AI needs to catch up. Instead it's AI that starts doing all the new things well first and humans get no opportunity to become competitive at anything, old or new, ever again. Even formulation of aligned high-level tasks and intent alignment of AIs make sense as jobs that could be done well by misaligned AIs for instrumental reasons. Which is not even deceptive alignment, but still plausibly segues into gradual disempowerment or sharp left turn.

2Garrett Baker9mo

I think this criticism doesn't make sense without some description of the AI progress its conditioning on. Eg in a Tyler Cowen world, I agree. In an Eliezer world I disagree.

[-]Nina Panickssery11mo172

Inspired by a number of posts discussing owning capital + AI, I'll share my own simplistic prediction on this topic:

Unless there is a hostile AI takeover, humans will be able to continue having and enforcing laws, including the law that only humans can own and collect rent from resources. Things like energy sources, raw materials, and land have inherent limits on their availability - no matter how fast AI progresses we won't be able to create more square feet of land area on earth. By owning these resources, you'll be able to profit from AI-enabled economic growth as this growth will only increase demand for the physical goods that are key bottlenecks for basically all productive endeavors.

To elaborate further/rephrase: sure, you can replace human programmers with vastly more efficient AI programmers, decreasing the human programmers' value. In a similar fashion you can replace a lot of human labor. But an equivalent replacement for physical space or raw materials for manufacturing does not exist. With an increase in demand for goods caused by a growing economy, these things will become key bottlenecks and scarcity will increase their price. Whoever owns them (some humans) will be collecting a lot of rent.

Even simpler version of the above: economics traditionally divides factors of production into land, labor, capital, entrepreneurship. If labor costs go toward zero you can still hodl some land.

Besides the hostile AI takeover scenario, why could this be wrong (/missing the point)?

[-]Lucius Bushnaq11mo*174

Space has resources people don't own. The earth's mantle a couple thousand feet down potentially has resources people don't own. More to the point maybe, I don't think humans will be able to continue enforcing laws barring a hostile takeover in the way you seem to think.

Imagine we find out that aliens are headed for earth and will arrive in a few years. Just from the light emissions their probes and expanding civilisation give off, we can infer that they're obviously more technologically mature than us, probably already engineered themselves to be much smarter than us, and can basically do whatever they want with the atoms that make up our solar system and there's nothing we can do about it. We don't know what they want yet though. Maybe they're friendly?

I think guessing that the aliens will be friendly and share human morality to an extent seems like a pretty specific guess about their minds to be making, and is maybe false more likely than not. But guessing that they don't care about human preferences or well-being but do care about human legal structures, that they won't at all help you or gift you things, also won't disassemble you and your property for its atoms^[1], but will t... (read more)

4the gears to ascension11mo

A question in my head is what range of fixed points are possible in terms of different numeric ("monetary") economic mechanisms and contracts. Seems to me those are a kind of AI component that has been in use since before computers.

[-]the gears to ascension11mo*112

Ownership is enforced by physical interactions, and only exists to the degree the interactions which enforce it do. Those interactions can change.

As Lucius said, resources in space are unprotected.

Organizations which hand more of their decision-making to sufficiently strong AIs "win" by making technically-legal moves, at the cost of probably also attacking their owners. Money is a general power coupon accepted by many interactions; ownership deeds are a more specific, narrow one; if the ai systems which enforce these mechanisms don't systemically reinforce towards outcomes where the things available to buy actually satisfy the preferences of remaining humans who own ai stock or land, then the owners can end up with no not-deadly food and a lot of money, while datacenters grow and grow, taking up energy and land with (semi?-)autonomously self replicating factories or the like - if money-like exchange continues to be how the physical economy is managed in ai to ai interactions, these self replicating factories might end up adapted to make products that the market will buy. but if the majority of the buying power is ai controlled corporations, then figuring out how to best manipulate ... (read more)

6Vladimir_Nesov11mo

There is a lot of space and raw materials in the universe. AI thinks faster, so technological progress happens faster, which opens up access to new resources shortly after takeoff. Months to years, not decades to centuries.

4Nina Panickssery11mo

If, for the sake of argument, we suppose that goods that provide no benefit to humans have no value, then land in space will be less valuable than land on earth until humans settle outside of earth (which I don't believe will happen in the next few decades). Mining raw materials from space and using them to create value on earth is feasible, but again I'm less confident that this will happen (in an efficient-enough manner that it eliminates scarcity) in as short of a timeframe as you predict. However, I am sympathetic to the general argument here that smart-enough AI is able to find more efficient ways of manufacturing or better approaches to obtaining plentiful energy/materials. How extreme this is will depend on "takeoff speed" which you seem to think will be faster than I do.

4Joseph Miller11mo

Why would it take so long? Is this assuming no ASI?

4Noosphere8911mo

This is actually true, at least in the short term, with the important caveat of the gears of ascension's comment here: https://www.lesswrong.com/posts/4hCca952hGKH8Bynt/nina-panickssery-s-shortform#quPNTp46CRMMJoamB Longer-term, if Adam Brown is correct on how advanced civilizations can change the laws of physics, then effectively no constraints remain on the economy, and the reason why we can't collect almost all of the rent is because you can drive prices arbitrarily low: https://www.dwarkeshpatel.com/p/adam-brown

4quetzal_rainbow11mo

I don't think "hostile takeover" is a meaningful distinction in case of AGI. What exactly prevents AGI from pulling plan consisting of 50 absolutely legal moves which ends up with it as US dictator?

4Nina Panickssery11mo

Perhaps the term “hostile takeover” was poorly chosen but this is an example of something I’d call a “hostile takeover”. As I doubt we would want and continue to endorse an AI-dictator. Perhaps “total loss of control” would have been better.

[-]Nina Panickssery1mo145

Whenever I read yet another paper or discussion of activation steering to modify model behavior, my instinctive reaction is to slightly cringe at the naiveté of the idea. Training a model to do some task only to then manually tweak some of the activations or weights using a heuristic-guided process seems quite un-bitter-lesson-pilled. Why not just directly train for the final behavior you want—find better data, tweak the reward function, etc.?

But actually there may be a good reason to continue working on model-internals control (i.e. ways of influencing model behavior outside of modifying the text input or training process, by directly changing internal state). For some applications, you may want to express something in terms of the model’s own abstractions, something that you won’t know a priori how to do in text or via training data in fine-tuning. Throughout the training process, a model naturally learns a rich semantic activation space. And in some cases, the “cleanest” way to modify its behavior is by expressing the change in terms of its learned concepts, whose representations are sculpted by exaflops of compute.

9Daniel Kokotajlo1mo

I always thought the point of activation steering was for safety/alignment/interpretability/science/etc., not capabilities.

6Nina Panickssery1mo

Not sure what distinction you're making. I'm talking about steering for controlling behavior in production, not for red-teaming at eval time or to test interp hypotheses via causal interventions. However this still covers both safety (e.g. "be truthful") and "capabilities" (e.g. "write in X style") interventions.

2Daniel Kokotajlo1mo

Well, mainly I'm saying that "Why not just directly train for the final behavior you want" is answered by the classic reasons why you don't always get what you trained for. (The mesaoptimizer need not have the same goals as the optimizer; the AI agent need not have the same goals as the reward function, nor the same goals as the human tweaking the reward function.) Your comment makes more sense to me if interpreted as about capabilities rather than about those other things.

4eggsyntax1mo

It seems like this applies to some kinds of activation steering (eg steering on SAE features) but not really to others (eg contrastive prompts); curious whether you would agree.

4Nina Panickssery1mo

Perhaps. I see where you are coming from. Though I think it’s possible contrastive-prompt-based vectors (eg. CAA) also approximate “natural” features better than training on those same prompts (fewer degrees of freedom with the correct inductive bias). I should check whether there has been new research on this…

2eggsyntax1mo

Thanks! If you find research that addresses that question, I'd be interested to know about it.

2faul_sname1mo

After all, what is an activation steering vector but a weirdly-constructed LoRA with rank 1[1]? 1. ^ Ok technically they're not equivalent because LoRAs operate in an input-dependent fashion on activations, while activation steering operates in an input-independent fashion on the activations. But LLMs very consistently have outlier directions in activation space with magnitudes that are far larger than "normal" directions and approximately constant across inputs. LoRA adds \(AB^Tx\\) to the activations. With r=1, you can trivially make BT aligned to the outlier dimension, which allows you to make BTx a scalar with value ≈ 1 (±0.06), which you can project to a constant direction in activation space with A. So given a steering vector, you can in practice make a *basically* equivalent but worse LoRA[2] in the models that exist today. 2. ^ Don't ask me how this even came up, and particularly don't ask me what I was trying to do with serverless bring-your-own-lora inference. If you find yourself going down this path, consider your life choices. This way lies pain. See if you can just use goodfire.

1Sheikh Abdur Raheem Ali1mo

Tinker is an API for LoRA PEFT. You don’t mention it directly, but it’s trendy enough that I thought your comment was a reference to it.

3faul_sname1mo

Several such APIs exist. My thought was "I'd like to play with the llamascope SAE features without having to muck about with vllm, and together lets you upload a LoRA directly", and I failed to notice that the SAE was for the base model and together only supports LoRAs for the instruct model. The fun thing about this LoRA hack is that you don't actually have to train the LoRA, if you know the outlier direction+magnitude for your model and the activation addition you want to apply you can write straight to the weights. The unfun thing is that it's deeply cursed and also doesn't even save you from having to mess with vllm. Edit: on reflection, I do think rank 1 LoRAs might be an underappreciated interpretability tool.

[-]Nina Panickssery5mo12-5

The motte and bailey of transhumanism

Most people on LW, and even most people in the US, are in favor of disease eradication, radical life extension, reduction of pain and suffering. A significant proportion (although likely a minority) are in favor of embryo selection or gene editing to increase intelligence and other desirable traits. I am also in favor of all these things. However, endorsing this form of generally popular transhumanism does not imply that one should endorse humanity’s succession by non-biological entities. Human “uploads” are much riskier than any of the aforementioned interventions—how do we know if we’ve gotten the upload right, how do we make the environment good enough without having to simulate all of physics? Successors that are not based on human emulation are even worse. Deep learning based AIs are detached from the lineage of humanity in a clear way and are unlikely to resemble us internally at all. If you want your descendants to exist (or to continue existing yourself), deep learning based AI is no equivalent.

Succession by non-biological entities is not a natural extension of “regular” transhumanism. It carries altogether new risks and in my opinion would almost certainly go wrong by most current people’s preferences.

[-]Said Achmiz5mo1914

The term “posthumanism” is usually used to describe “succession by non-biological entities”, for precisely the reason that it’s a distinct concept, and a distinct philosophy, from “mere” transhumanism.

(For instance, I endorse transhumanism, but am not at all enthusiastic about posthumanism. I don’t really have any interested in being “succeeded” by anything.)

2Nina Panickssery5mo

That makes sense, I just often see these ideas conflated in popular discourse.

[-]Buck5mo149

I find this position on ems bizarre. If the upload acts like a human brain, and then also the uploads seem normalish after interacting with them a bunch, I feel totally fine with them.

I also am more optimistic than you about creating AIs that have very different internals but that I think are good successors, though I don't have a strong opinion.

5Nina Panickssery5mo

I am not philosophically opposed to ems, I just think they will be very hard to get right (mainly because of the environment part—the em will be interacting with a cheap downgraded version of the real world). I am willing to change my mind on this. I also don’t think we should avoid building ems, but I think it’s highly unlikely an em life will ever be as good as or equivalent to a regular human life so I’d not want my lineage replaced with ems.

4Nina Panickssery5mo

In contrast to my point on ems, I do think we should avoid building AIs whose main purpose is to be equivalent to (or exceed) humans in “moral value”/pursue anything that resembles building “AI successors”. Imo the main purpose of AI alignment should be to ensure AIs help us thrive and achieve our goals rather than to attempt to embed our “values” into AIs with the goal of promoting our “values” independently of our existence. (Values is in scare quotes because I don’t think there’s such a thing as human values—individuals differ a lot in their values, goals, and preferences.)

2evhub5mo

Would you be convinced if you talked to the ems a bunch and they reported normal, happy, fun lives? (Assuming nothing nefarious happened in terms of e.g. modifying their brains to report that.) I think I would find that very convincing. If you wouldn't find that convincing, what would you be worried was missing?

4Nina Panickssery5mo

I would find that reasonably convincing, yes (especially because my prior is already that true ems would not have a tendency to report their experiences in a different way from us).

2the gears to ascension5mo

i want drastically upgraded biology, potentially with huge parts of the chemical stack swapped out in ways I can only abstractly characterize now without knowing what the search over viable designs will output. but in place, without switching to another substrate. it's not transhumanism, to my mind, unless it's to an already living person. gene editing isn't transhumanism, it's some other thing; but shoes are transhumanism for the same reason replacing all my cell walls with engineered super-bio nanotech that works near absolute zero is transhumanism. only the faintest of clues what space an ASI would even be looking in to figure out how to do that, but it's the goal in my mind for ultra-low-thermal-cost life. uploads are a silly idea, anyway, computers are just not better at biology than biology. anything you'd do with a computer, once you're advanced enough to know how, you'd rather do by improving biology

2Nina Panickssery5mo

I share a similar intuition but I haven't thought about this enough and would be interested in pushback! You can do gene editing on adults (example). Also in some sense an embryo is a living person.

2dr_s5mo

IMO the whole "upload" thing changes drastically depending on our understanding of consciousness and continuity of the self (which is currently nearly non-existent). It's like teleportation - I would let neither that nor upload happen to me willingly unless someone was able to convincingly explain me how precisely are my qualia associated with my brain and how they're going to move over (rather than just killing me and creating a different entity). I don't believe it's impossible for an upload to be "me". But I doubt it'd be as easy as simply making a scan of my synapses and calling it a day. If it is, and if that "me" is then also infinitely copiable, I'd be very ambivalent about it (given all the possible ways it could go horribly wrong - see this story or the recent animated show Pantheon for ideas). So it's definitely a "ok, but" position for me. Would probably feel more comfortable with a "replace my brain bit by bit with artificial functional equivalents" scenario as one that preserves genuine continuity of self.

2Nina Panickssery5mo

I think a big reason why uploads may be much worse than regular life is not that the brain scan will be not good enough but that they won’t be able to interact with the real world like you can as a physical human. Edit: I guess with sufficiently good robotics the ems would be able to interact with the same physical world as us in which case I would be much less worried.

2dr_s5mo

I'd say even simply a simulated physical environment could be good enough to be indistinguishable. As Morpheus put it: Of course, that would require insane amounts of compute, but so would a brain upload in the first place anyway.

2quetzal_rainbow5mo

I feel like this position is... flimsy? Unsubstantial? It's not like I disagree, I don't understand why you would want to articulate it in this way. On the one hand, I don't think biological/non-biological distinction is very meaningful from transhumanist perspective. Is embryo, genetically modified to have +9000IQ, going to be meaningfully considered "transhuman" instead of "posthuman"? Are you going to still be you after one billion years of life extension? "Keeping relevant features of you/humanity after enormous biological changes" seems to be qualitatively the same to "keeping relevant features of you/humanity after mind uploading" - i.e., if you know at gears-level what features of biological brains are essential to keep, you have rough understanding what you should work on in uploading. On the other hand, I totally agree that if you don't feel adventurous and you don't want to save the world at price of your personality death, it would be a bad idea to undergo uploading in a way that closest-to-modern technology can provide. It just means that you need to wait for more technological progress. If we are in the ballpark of radical life extension, I don't see any reason to not wait 50 years to perfect upload tech and I don't see any reason why 50 years are not going to be enough, conditional on at least normally expected technical progress. The same with AIs. If we have children, who are meaningfully different from us, and who can become even more different in glorious transhumanist future, I don't see reasons to not have AI children, conditional on their designs preserving all important relevant features we want to see in our children. The problem is that we are not on track to create such designs, not conceptual existence of such designs. And all said seems to be simply deducible/anticipated from concept of transhumanism, i.e., concept that the good future is the one filled with beings capable to meaningfully say that they were Homo Sapiens and stopped bei

2avturchin5mo

I am going to post about biouploading soon – where the uploading is happened into (or via) a distributed net of my own biological neurons. This combines good things about uploading – immortality, ability to be copied, easy to repair, and good things about being biological human – preserving infinite complexity, exact sameness of a person, guarantee that the bioupload will have human qualia and any other important hidden things which we can miss.

2Vladimir_Nesov5mo

Like with AGI, risks are a reason to be careful, but not a reason to give up indefinitely on doing it right. I think superintelligence is very likely to precede uploading (unfortunately), and so if humanity is allowed to survive, the risks of making technical mistakes with uploading won't really be an issue. I don't see how this has anything to do with "succession" though, there is a world of difference between developing options and forcing them on people who don't agree to take them.

[-]Nina Panickssery2mo100

Criticism quality-valence bias

Something I've noticed from posting more of my thoughts online:

People who disagree with your conclusion to begin with are more likely to carefully read and point out errors in your reasoning/argumentation, or instances where you've made incorrect factual claims. Whereas people who agree with your conclusion before reading are more likely to consciously or subconsciously gloss over any flaws in your writing because they are onboard with the "broad strokes".

So your best criticism ends up coming with a negative valence, i.e. from people who disagree with your conclusion to begin with.

(LessWrong has much less of this bias than other places, though I still see some of it.)

3Vladimir_Nesov2mo

Thus a better way of framing criticism is to narrowly discuss some issue with reasoning, putting aside any views about the conclusion, leaving its possible reevaluation an implicit exercise for the readers.

[-]Nina Panickssery2mo60

Could HGH supplementation in children improve IQ?

I think there's some weak evidence that yes. In some studies where they give HGH for other reasons (a variety of developmental disorders, as well as cases when the child is unusually small or short), an IQ increase or other improved cognitive outcomes are observed. The fact that this occurs in a wide variety of situations indicates that it could be a general effect that could apply to healthy children.

Examples of studies (caveat: produced with the help of ChatGPT, I'm including null results also). Left colum... (read more)

1VM72mo

I would also suggest looking at IGF-1. You can reach out to me; this topic interests me and I have a lot of experience working with HGH and IGF-1 (including a world record). https://pubmed.ncbi.nlm.nih.gov/16263982/

1Kabir Kumar2mo

has it been tested on adults a lot?

[-]Nina Panickssery7mo40

On optimizing for intelligibility to humans (copied from substack)

One risk of “vibe-coding” a piece of software with an LLM is that it gets you 90% of the way there, but then you’re stuck—the last 10% of bug fixes, performance improvements, or additional features is really hard to figure out because the AI has written messy, verbose code that both of you struggle to work with. Nevertheless, to delegate software engineering to AI tools is more tempting than ever. Frontier models can spit out almost-perfect complex React apps in just a minute, something that

... (read more)

2Viliam7mo

I wonder, in the unlikely case that the AI progress would stop, and we would be left with AIs exactly as smart as they are now, whether that would completely ruin software development. We would soon have tons of automatically generated software that is difficult for humans to read. People developing new libraries would be under smaller pressure to make them legible, because as long as they can be understood by AIs, who cares. Paying a human to figure this out would be unprofitable, because running the AI thousand times and hoping that it gets it right once would be cheaper. Etc.

2faul_sname7mo

Current LLM coding agents are pretty bad at noticing that a new library exists to solve a problem in the first place, and at evaluating whether an unfamiliar library is fit for a given task. As long as those things remain true, developers of new libraries wouldn't be under much pressure in any direction, besides "pressure to make the LLM think their library is the newest canonical version of some familiar lib".

[-]Nina Panickssery2mo30

Think clearly about the current AI training approach trajectory

If you start by discussing what you expect to be the outcome of pretraining + light RLHF then you're not talking about AGI or superintelligence or even the current frontier of how AI models are trained. Powerful, general AI requires serious RL on a diverse range of realistic environments, and the era of this has just begun. Many startups are working on building increasingly complex, diverse, and realistic training environments.

It's kind of funny that so much LessWrong arguing has been around wh... (read more)

[-]Nina Panickssery3mo30

What, concretely, is being analogized when we compare AI training to evolution?

People (myself included) often handwave what is being analogized when it comes to comparing evolution to modern ML. Here's my attempt to make it concrete:

Both are directed search processes (hence the analogy)
Search space: possible genes vs. possible parameter configurations
Direction of search: stuff that survives and increases in number vs. stuff that scores well on loss function
Search algorithm: random small steps vs. locally greedy+noisy steps

One implication of this is that we... (read more)

2Jeremy Gillen3mo

Have you read the evolution sequence? I think it does a good job of explaining why the direction of change isn't quite toward stuff that survives and increases in number.

4Nina Panickssery3mo

No I have not, will take a look

2Nina Panickssery3mo

Actually maybe I have but forgot its contents haha Edit: Wait it is super long, could you more succinctly explain where I’m going wrong?

[-]Nina Panickssery10mo20

Was recently reminded of these excellent notes from Neel Nanda that I came across when first learning ML/MI. Great resource.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

Nina Panickssery's Shortform

7

Ω 4