All of tailcalled's Comments + Replies

The model is under development. I would like to discuss these things but in order to say much about how rationalist beliefs are distributed, I need to run some surveys on rationalist beliefs first. Will take some time to write up the results.

If you look at the analytics, it turns out most rationalists do believe you should fight bullying. The model used to construct the test was just wrong.

Thanks.  I guess it's just more evidence that I am not a rationalist.  I did get 13/18 when I took it as best I could.  There was no option for "don't care", so I picked the middle one for things I just didn't connect with, or where it was a conjunction and I agreed with one half and disagreed with the other half.  This was a majority of the "X because Y" questions.   I wish the scoring were a lot more precise - like standard deviations from mean, or log-distance from mode.  I also wish you'd show the "correct" range for the ones I was in-range, just so I could see how close to the edge I was.  But really, I wish there were more discussion of the analytics and interpretation of questions - I currently don't see how/whether I should change my views on the cult based on this. Specific disagreements with your scoring, likely because of phrasing or misunderstanding of question: Unless they're known to lie/exaggerate, you should believe that they've seen things which they feel "ghosts" is the best description.  Believe your friends, and liars aren't friends. Ok, likely a real disagreement with other poll-takers.  I don't think all or most charity organizations (or non-charity organizations, for that matter) "should" bundle lifestyle choices with employment compensation and mission alignment.  SOME organizations do so, and successfully, but it doesn't generalize AT ALL. Maybe an interpretation thing.  "Driving cars", as in the ability to drive a car when useful, CLEARLY expands one's options for independent transportation and location choices.   "Driving cars", as in the societal assumption that it's the only or primary way for people to move about, and that it's necessary for most people to spend a lot of time isolated in their car is far less desirable.  But it DOES increase independence, just not the good kind of independence. Genuinely surprised that people think it's not bad to intentionally destroy value.  I wonder if there's a different availability heuristic

Please remember:

Warning: this is not necessarily an accurate or useful test; it's a test that arose through irresponsible statistics rather than careful thought.

The test is a wild extrapolation based on a little bit of data about rationalists and a tons of data about what random nonrationalists believe.

If you want to see what rationalists actually believe, you should view the analytics:

I originally asked people qualitatively what they think the role of different jobs in society are. Then based on that I made a survey with about 100 questions and found there to be about 5 major factors. I then qualitatively asked people about these factors, which lead to me finding additional items that I incorporated in additional surveys. Eventually I had a pool of around 1000 items covering beliefs in various domains, albeit with the same 5-factor structure as originally.

I suggested that 20 of the items from different factors should be included in the ... (read more)

In theory according to the model, rationalists should score slightly above 12 on average, and because we expect a wide spread of opinions, this means according to the model we should also expect a lot of rationalists to just score 12 directly. So there's nothing funky if you score 12.

What does the model predict non-rationalists would score?

My model takes the prevalence of the opinion into account; it's the reason that sometimes you have to e.g. agree strongly and other times you merely have to not-disagree. There's unpopular opinions that the factor model does place correctly, e.g. I can't remember whether I have a question about abolishing the police, but supporting human extinction clearly went under the leftism factor even though leftists also disagreed (because leftists were less likely to disagree and disagreed less strongly in a quantitative sense).

I think the broader/fuzzier class poi... (read more)

I agree with these points but as I mentioned in the test:

Warning: this is not necessarily an accurate or useful test; it's a test that arose through irresponsible statistics rather than careful thought.

The reason I made this survey is to get more direct data on how well the model extrapolates (and maybe also to improve the model so it extrapolates better).

Are you predicting the LW responses or is a model you made predicting them? I find this opinion weird, probably because there are multiple reasonable interpretations with quite different truth-values. 

Wait I think I might have biased it in favor of popular opinions, 2 sec

This should be better:

Also if you fill out this table then I can easily generate a corresponding list for any other case (e.g. an individual, the means from a community you've distributed these questions to, the means from a subset of the responses from the survey data, etc.):

Quick response where I will go more in-depth later: based on the the mean scores for the bonus political questions and the norms+5-factor model I've gotten from a sample on Prolific, here's some beliefs that likely correlate with being a member of LessWrong:

Wait I think I might have biased it in favor of popular opinions, 2 sec
Also if you fill out this table then I can easily generate a corresponding list for any other case (e.g. an individual, the means from a community you've distributed these questions to, the means from a subset of the responses from the survey data, etc.):


If you don't handle all of some domain but instead just handle "many" settings within the domain, you're not complete with respect to the domain.

"Complete" implies "general".

Reading the Wikipedia article for "Complete (complexity)," I might have misinterpreted what "complete" technically means. What I was trying to say is "given Sora, you can 'easily' turn it into an agent" in the same way that "given a SAT solver, you can 'easily' turn it into a solver for another NP-complete problem." I changed the title from "OpenAI's Sora is agent-complete" to "OpenAI's Sora is an agent," which I think is less misleading. The most technically-correct title might be "OpenAI's Sora can be transformed into an agent without additional training."

Agent-complete would surely have to mean that it can do just about any task that requires agency, rather than that it can just-barely be applied to do the very easiest tasks that require agency. I strongly doubt that SORA is agent-complete in the strong sense.

That sounds more like "AGI-complete" to me. By "agent-complete" I meant that Sora can probably act as an intelligent agent in many non-trivial settings, which is pretty surprising for a video generator!

This would predict that we are good at finding precise information in less sensitive areas, which I don't think we are. Rather, people don't know how to create high-quality precise information, so in most areas discourse gets clogged with junk information, and in some sensitive areas we acknowledge the information is bad and therefore try to not assert much.

That makes sense as a critique of my or Bailey's writing, but "Davis and Bailey's writing is unclear and arguably deceptive given their target audience's knowledge" is a very different claim than "autogynephilia is not a natural abstraction"!!

More specifically, I believe that the reason your and Bailey's writing is unclear given the target audience's knowledge is because the audience lacks the knowledge needed for autogynephilia to be a natural abstraction. And my claim "autogynephilia is not a natural abstraction" is not primarily critiquing you or Bailey... (read more)

I'm saying it's dumb to assert that P. albicaulis isn't a natural abstraction just because most people are ignorant of dendrology and are only paying attention to the shrub vs. tree subspace: if I look at more features of vegetation than just broad shape, I end up needing to formulate P. albicaulis to explain the things some of these woody plants have in common despite their shape.

And I think this is fine if you're one of the approximately 5 people (me, maybe Bailey, maybe Andura, maybe Hsu, maybe you - even this is generous since e.g. I think you natur... (read more)

(Continued in containment thread.)
0Gerald Monroe11d Factorization is working extremely well though. (Some tasks may factorize poorly but package logistics subdivides well. Any task that can be transformed to look like package logistics is similar. I can think of a way to transform most tasks to look like package logistics, do you have a specific example? Fusion reactor construction is package logistics albeit design is not)

I agree with that, but I think it is complicated in the case of autogynephilia. My claim is that there are a handful of different conditions that people look at and go "something had to be carrying whatever information made these people so similar", and the something is not simply autogynephilia in these sense that I talk about it, but rather varies from condition to condition (typically including autogynephilia as one of the causes, but often not the most relevant one).

This seems somewhat relevant to a disagreement I've been having with @Zack_M_Davis about whether autogynephilia is a natural abstraction.

Some foundations for people who are not familiar with the topic: Autogynephilia is a sexual interest in being a woman, in a sense relatively analogous to other sexual interests, such as men's usual sexual interest in women (gynephilia). That is, autogynephiles will have sexual fantasies about being women, will want to be women, and so on.

I argue that autogynephilia is not a natural abstraction because most autogynephiles ... (read more)

Consider the consensus genome of some species of tree. Long before we were able to sequence that genome, we were able to deduce that something-like-it existed. Something had to be carrying whatever information made these trees so similar (inter-breedable). Eventually people isolated DNA as the relevant information-carrier, but even then it was most of a century before we knew the sequence. That sequence is a natural latent: most of the members of the tree's species are ~independent of each other given the sequence (and some general background info about our world), and the sequence can be estimated pretty well from ~any moderate-sized sample of the trees. Furthermore, we could deduce the existence of that natural latent long before we knew the sequence. Point of this example: there's a distinction between realizing a certain natural latent variable exists, and knowing the value of that variable. To pick a simpler example: it's the difference between realizing that (P, V, T) mediate between the state of a gas at one time and its state at a later time, vs actually measuring the values of pressure, volume and temperature for the gas.

One thing I should maybe emphasize which my above comment maybe doesn't make clear enough is that "GPTs do imitation learning, which is safe" and "we should do bounded optimization rather than unbounded optimization" are two independent, mostly-unrelated points. More on the latter point is coming up in a post I'm writing, whereas more of my former point is available in links like this.

Doomimir: This is all very interesting, but I don't think it bears much on the reasons we're all going to die. It's all still on the "is" side of the is–ought gap. What makes intelligence useful—and dangerous—isn't a fixed repertoire of behaviors. It's search, optimization—the systematic discovery of new behaviors to achieve goals despite a changing environment. I don't think recent capabilities advances bear on the shape of the alignment challenge because being able to learn complex behavior on the training distribution was never what the problem was abou

... (read more)
One thing I should maybe emphasize which my above comment maybe doesn't make clear enough is that "GPTs do imitation learning, which is safe" and "we should do bounded optimization rather than unbounded optimization" are two independent, mostly-unrelated points. More on the latter point is coming up in a post I'm writing, whereas more of my former point is available in links like this.
It is late at night, I can't think clearly, and I may disavow whatever I say right now later on. But your comment that you link to is incredible and contains content that zogs rather than zigs or zags from my perspective and I'm going to re-visit when I can think good. I also want to flag that I have been enjoying your comments when I see them on this site, and find them novel, inquisitive and well-written. Thank you.

For instance, do you think that this case for accident risk comes down to subtle word games?

I think so.

I'm not sure whether the case for risk in general depends on word-games, but the case for x-risk from GPTs sure seems to. I think people came up with those word-games partly in response to people arguing that GPTs give us general AI without x-risk?

It feels to me like the cost would be roughly proportional to how much is lost. Or maybe quadratic or something in how much is lost.

Like on the margin we probably regularly lose and rediscover unimportant physical discoveries, because it's just too expensive to keep track of them. If there was some very important law that was magically lost with nothing else changed, then it would probably not take very long to stitch it together from context and observations.

But these observations would be based on advanced measurement devices that have been constructed i... (read more)

If the masses don't want to invest in learning the truth, then they won't learn it, but you'd need some extremely totalitarian guarding to prevent that. However if the truth happens to permit you to achieve useful things in a subject, then that is leverage that can be used to inspire some part of the masses to learn it.

I see. I think maybe I read it when it came out so I didn't see the update.

Regarding the

Not worth getting into?

I'm guessing it's probably not worth the time to resolve this?


I'd guess it's worth getting into because this disagreement is a symptom of the overall question I have about your approach/view.

Though on the other hand maybe it is not worth getting into because maybe once I publish a description of this you'll basically go "yeah that seems like a reasonable resolution, let's go with that".

Yep, which is basically my point. I can't think of any case where I've seen him discuss the distinction.

From the third paragraph of Reward is not the optimization target:

FFS has a different notion of causality, which is both weaker and stronger in important ways. FFS defines the history of an event  to be the minimal set of factors which uniquely specify an element of . Then an event  is thought of as "after"  (written ) if the history of  is a subset of the history of .

This makes one huge departure from Bayes nets: no information is allowed to be lost. If we have a system where  directly causes , but there is some "noise" in  which

... (read more)
1J Bostock21d
I was thinking about causality in terms of forced directional arrows in Bayes nets, rather than in terms of d-separation. I don't think your example as written is helpful because Bayes nets rely on the independence of variables to do causal inference: X→Y→Z is equivalent to X←Y←Z. It's more important to think about cases like X→Y←Z where causality can be inferred. If we change this to ^X,^Y,^Z by adding noise then we still get a distribution satisfying ^X→^Y←^Z (as ^X and ^Z are still independent). Even if we did have other nodes forcing X→Y→Z (such as a node U which is parent to Y, and another node V which is parent to Z), then I still don't think adding noise lets us swap the orders round. On the other hand, there are certainly issues in Bayes nets of more elements, particularly the "diamond-shaped" net with arrows W→X,W→Y,X→Z,Y→Z. Here adding noise does prevent effective temporal inference, since, if ^X and ^Y are no longer d-separated by ^W, we cannot prove from correlations alone that no information goes between them through ^Z.

I think it makes sense to have a specific word for the thing where you do after the network with weights has given an output (or variants thereof, e.g. DPO). TurnTrout seems basically correct in saying that it's common for rationalists to mistakenly think the network will be consequentialistically aiming to get a lot of these updates, even though it really won't.

On the other hand I think TurnTrout lacks a story for what happens with stuff like DreamerV3.

As far as I understand, "reward is not the optimization target" is about model-free RL, while DreamerV3 is model-based.

I feel like it's gotta be pretty tricky to fully eliminate the possibility that someone had counterfactual impact when there was simultaneous invention, if the person who had counterfactual impact had been talking about their partially developed ideas beforehand. It could lead to others developing them further.

I know a ton about psychometrics when applied to humans, and I've been thinking on and off about whether some of these methods could be applied to neural networks. Overall I'm bearish about the prospects, but I eventually got one idea that stayed quite promising even after thinking about it for a while. I've been distracted from implementing it by an idea for solving the alignment problem, though, so I've been shelving it for a bit, but if anyone less distractable wants to collaborate then I'd be up for that. It needs to wait until I've published my idea o... (read more)

Yeah my intuition is that a field "inspired by psychometrics" but not really psychometrics as it exists now will spawn. Some sort of neural psychometry, models dedicated to assess the safety of other models, something like that.  For the sake of this art project and since you know a lot about psychometrics would you have another test to recommend and provide? I'm particularly interested in clinical tests like the Minnesota Multiphasic Personality Inventory-2?

How robust is the information that infertility rates are rising?

To be sure, I'm not an expert on the topic. Declines in male fertility I think are regarded as real, though I haven't examined the primary sources. Regarding female fertility, this report from Norway outlines the trend that I vaguely thought was representative of most of the developed world over the last 100 years.  Female fertility is trickier to measure, since female fertility and age are strongly correlated, and women have been having kids later, so it's important (and likely tricky) to disentangle this confounder from the data.

I think one mindset that may be healthy is to remember:

Reality is too complex to be described well by a single idea (meme/etc.). If one responds to this by forcing each idea presented to be as good an approximation of reality as possible, then that causes all the ideas to become "colorless and blurry", as any specific detail would be biased when considered on its own.

Therefore, one cannot really fight about whether an idea is biased in isolation. Rather, the goal should be to create a bag of ideas which in totality is as informative about a subject as poss... (read more)

I'd expect that to depend heavily on the definition of "good done" and "cutting corners". For some definitions I'd expect a positive correlation and other definitions I'd expect a negative correlation.

I could buy something like this with the continuous time limit.

I just mean if you want to extend this to cover things outside of the shutdown problem. Like you might want to request the AI to build you a fusion power plant, or cook you a chocolate cake, or make a company that sells pottery, or similar. You could have some way of generating a utility function for each possibility, and then generate subagents for all of them, but if you do this you've got an exponentially large conjunction.

I don't think this approach is going to generalize to alignment because in order to detailedly control agents in this way, you need to give exponentially many agents veto power, which means that even a small probability of veto from an individual agent will lead to certainty of veto from some agent. That said, this plausibly solves the shutdown problem.

I understand that this is not the goal but I thought it would be relevant to consider anyway, if the hope is to build on top of this.

2Charlie Steiner1mo
But for the continuous limit the subagents become similar to each other at the same rate as they become more numerous. It seems intuitive to me that with a little grinding you could get a decision-making procedure whose policy is an optimum of an integral over "subagents" who bet on the button being pushed at different times, and so the whole system will change behavior upon an arbitrarily-timed press of the button. Except I think in continuous time you probably lose guarantees about the system not manipulating humans to press/not press the button. Unless maybe each subagent believes the button can only be pressed exactly at their chosen time. But this highlights that maybe all of these counterfactuals give rise to really weird worlds, that in turn will give rise to weird behavior.

Nice point.

Though I'm almost tempted to think of LLMs as being like people who are LARPing or who have impostor syndrome. As in, they spend pretty much all their cognitive capacity on obsessing over doing what they feel looks normal. (This also closely aligns with how they are trained: first they are made to mimic what other people do, and then they are made to mimic what gets praise and avoid what gets critique.) Probably humanizes them even more than your friendly creature proposal.

This sounds somewhat similar to deceptive alignment, so I want to draw a ... (read more)

Oops, my bad for focusing on the simplified version and then extrapolating incorrectly.

I agree that Ozy made these recommendations and that I didn't emphasize their recommendations in my summary. I think what I summarized was the problems Ozy pointed at. These problems are things the recommendations are meant to address, but I suspect there are some underlying dynamics that generate the problems (at least except the 3rd one) and so I don't think EA will listen to the recommendations well enough to fix them and therefore I think that the problems are more relevant to list because they show the future of EA. But of course this is a subjective editorial choice and I think one could reasonably have done otherwise.

There's a thing I keep thinking of when it comes to natural latents. In social science one often speaks of statistical sex differences, e.g. women tend to be nicer than men. But these sex differences aren't deterministic, so one can't derive them from simply meeting 1 woman and 1 man. It feels like this is in tension with the insensitivity condition, like that this only permits universal generalizations (I guess stuff like anatomy?). But also I haven't really practiced natural latents at all so I sort of expect to be using them in a suboptimal way. Maybe y... (read more)

Quite the opposite! In that example, what the insensitivity condition would say is: if I get a big sample of people (roughly 50/50 male/female), and quantify the average niceness of men and women in that sample, then I expect to get roughly the same numbers if I drop any one person (either man or woman) from the sample. It's the statistical average which has to be insensitive; any one "sample" can vary a lot. That said, it does need to be more like a universal generalization if we impose a stronger invariance condition. The strongest invariance condition would say that we can recover the latent from any one "sample", which would be the sort of "universal generalization" you're imagining. Mathematically, the main thing that would give us is much stronger approximations, i.e. smaller ϵ's.

These 4 beefs aren't about the original accusations; Ozy's previous post was about the original accusations. Rather, these 4 beefs are concerns that Ozy already had about Effective Altruism in general, and which the drama around Nonlinear ended up highlighting as a side-effect.

Because these beefs are more general, they're not as specifically going to capture the ways Alice and Chloe were harmed. However I think on a community level, these 4 dynamics should arguably be a bigger concern than the more specific abuse Alice and Chloe faced, because they seem to some extent self-reinforcing, e.g. "Do It For The Gram" will attract and reward a certain kind of people who aren't going to be effectively altruistic.

You could have a liberal society while making the AIs more bounded than full-blown liberalism maximizers. That's probably what I'd go for. (Still trying to decide.)

When it comes to conflict deescalation specifically (which is needed to avoid war, but doesn't deal with other aspects of value), I guess the better way would be "negotiate some way for the different parties in the conflict to get as much of what they want as possible".

This is somewhat related to preference utilitarianism in that it might involve deference to some higher power that takes the preferences of all the members in the conflict into account, but it avoids population ethics and similar stuff because it just has to deal with the parties in the conf... (read more)

Well, so far no such higher power seems forthcoming, and totalizing ideologies grip public imagination as surely as ever, so the need for liberalism-or-something-better is still live, for those not especially into wars.

Is it possible to get access to those five reports as a case study?

I wouldn't mind in principle, but it is extremely compressed, and kind of stitched together. As in, while I have examples of individual bits and pieces of my claims, I don't have any examples that go end-to-end. Instead I derived bits of the theory from the different examples and then stuck those theory-bits together into an overall framework.

So basically I can zoom in on different bits but I don't have time to zoom in on it all at once. (I am working writing it all up at once, but that's not ready yet. Partly I'm also delaying because I prefer searching for examples that go end-to-end and because I'm still learning new nuances and techniques.)

I can't help but think a lot of this stuff is caused by an ontology mismatch. People encounter dynamics where they feel stuck or harmed, and they want there to be norms that prevent those dynamics, but norms operate on the level of behavior, which is typically too adaptable and nuanced to really be the source of the problems people encounter. You basically have to directly manage your community on the basis of traits that create patterns of behavior, rather than on the basis of the behavior itself. But this requires a comprehensive ontology of harm-relevan... (read more)

I don't understand what you're trying to say - would you mind trying to illustrate what you mean?

I agree that this is the approach to a solution for those who agree with liberalism.

That said, in addition to having been convinced that consequentialist agency and utilitarian morality are wrong, I think I've also become persuaded that liberalism is simply wrong? Which is kind of a radical position that I need to stake out elsewhere, so let me reduce the critique to some more straightforward variants:

  • "Boundaries" seems to massively suffer from nearest unblocked strategy problems, since it's focused on blocking things.

  • Liberalism already in some ways

... (read more)
Of course liberalism has struggles, the whole point of it is that it's the best currently known way to deal with competing interests and value differences short of war. This invites three possible categories of objection: that there is actually a better way, that there is no better way and liberalism also no longer works, or that wars are actually a desirable method of conflict resolution. From what I can tell, yours seem to fall into the second and/or third category, but I'm interested in whether you have anything in the first one.
1Mo Putera1mo
I don't have anything to add other than that I really appreciate how you've articulated a morass of vague intuitions I've begun to have re: boundaries-oriented ethics, and that I hope you end up writing this up as a full standalone post sometime.
Load More