All of Gordon Seidoh Worley's Comments + Replies

Like it. Seems like another way of saying that sometimes what you really need is more dakka. Tagged the post as such to reflect that.

I think it's also the case that there are no true laws of which we can speak, but instead various interpretations of laws, though hopefully most people have agreeing interpretations, and much of the challenge within the legal profession is figuring out how to interpret laws.

Within common law (the only legal system I'm really familiar with), this issue of interpretation is actually a key aspect of the system. Judges serve the purpose of providing interpretations, and law is operationalized via decisions on cases that set precedents that can inform future in... (read more)

For a less rationalist-flavored take on the same point, I recommend this YouTube video:

This is how I used to buy clothes. At least in my case I got some hard advice from a friend: I was picking pieces of clothes that were fine in isolation but didn't really come together to create a look/fit that was me and made me look unintentional and thus less good. It also made it too easy to optimize for function at the expense of form to the point of picking things that met great functional requirements but looked bad, like technical hiking pants that met tons of needs other than looking good or fitting my body well.

In order to actually look put together I realized that I needed to take a more global approach to my clothes optimization.

Yes, I agree that if "practical problem in your life" did not include "looking good" or "goes with my other clothes" as design parameters then you'd probably end up in a situation like that. I succeeded at avoiding this problem because I specifically set out to find pants that were good for biking and looked like professional work pants (fortunately I already had some that did). This can be useful: it puts a sharp constraint on the shirts I buy, requiring them to look good with these specific pants. That limitation can be helpful in making the overwhelming number of choices manageable.

Actually, I kind of forgot what ended up in the paper, but then I remembered so wanted to update my comment.

There was an early draft of this paper that talked about deontology, but because there are so many different forms of deontology it was hard to come up with arguments where there wasn't some version of deontological reasoning that broke the argument, so I instead switched to talking about the question of moral facts independent of ethical system. That said, the argument I make in the paper suggesting that moral realism is more dangerous than moral an... (read more)

I sometimes literally have to say this in long threads. Sometimes in a thread of conversation, my interlocutor simple has too big an inferential gap for me to help them cross, and the kind but maybe not maximally nice thing to do is stop wasting both of our times. This happens for a variety of reasons, and being able to express something about it is useful.

In everyday conversation we have norms against this because they are status moves to shut down conversations, and taking such a move here does risk a status hit if others think you are making a gambit to give up a line of conversation that is proving you wrong, for example. But ultimately there's nothing in the reacts you can't just say with a comment.

I don't see it in the references so you might find this paper of mine (link is to Less Wrong summary, which links to full thing) interesting because within it I include an argument suggesting building AI that assumes deontology is strictly more risky than building one that does not.

3William D'Alessandro7d
Excellent, thanks! I was pretty confident that some other iterations of something like these ideas must be out there. Will read and incorporate this (and get back to you in a couple days).

I don't think there's anything wrong with presenting arguments that the orthogonality thesis might be false. However, if those arguments are poorly argued or just rehash previously argued points without adding anything new then they're likely to be downvoted.

I actually almost upvoted this because I want folks to discuss this topic, but ultimately downvoted because it doesn't actually engage in arguments that seem likely to convince anyone who believes the orthogonality thesis. It's mostly just pointing at a set of intuitions that cause surprise at the orth... (read more)

To be fair, I'm not saying it's obviously wrong; I'm saying it's not obviously true, which is what many people seem to believe!

If it is the case that OpenAI is already capable of building a weakly general AI by this process, then I guess most of the remaining uncertainty lies in determining when it's worthwhile for them or someone like them to do it.

Also while I'm leaving feedback, I think there's too much nuance/overlap between some of the reactions. I think I'd prefer a smaller set that was something like:

  • insightful
  • confusing
  • i like this
  • i love this
  • i disagree and am maybe angry about it
  • this post is offputting in some other way that violates norms
4Max H11d
I like the wide variety of possible reactions available in Slack and Discord, though I think for LW, the default / starting set could be a bit smaller, to reduce complexity / overwhelming-ness of picking an appropriate reaction. Reactions I'd strike:  * Additional questions (I'd feel a bit disconcerted if I received this reaction without an accompanying comment.) * Strawman (kinda harsh for a reaction) * Concrete (this is either covered by an upvote, or seems like faint praise if not accompanied by an upvote.) * one of "key insight" or "insightful" and one of "too harsh" or "combative" (too much overlap)   But maybe it's easier to wait and see which reactions are used least often, and then eliminate those.
I think nuance is good. A drawback is that on comments that get a lot of reactions it could be too much information for people to pay attention to as compared with a smaller set, but I think that is a worthwhile tradeoff.

Basically my theory is that reactions should be clearly personal reactions and stuff that can't be objected to (e.g. I can't object if you found my presentation overcomplicated, that's just how you felt about it), and anything that can be read as a bid to make claims should not be included because there's no easy way to respond to a reaction. I think on this grounds I also dislike the strawman and seems borderline reactions.

Regarding "overcomplicated" it seems to me there is an ambiguity between whether it refers to the presentation or the underlying ideas. Perhaps "muddled" could be used refer to the overcomplication of the presentation, but in that case, it could also suggest that the commenter's ideas are muddled - and I don't like the "muddled" name, it seems too judgemental and people might avoid using it to avoid being seen as overly harsh. I think it would be useful for people to be able to compactly express specific issues that they subjectively think regarding the underlying comment, but a lot of the value would come from them being very precise, and distinguishing between e.g. whether it is the user's reaction to their (possibly flawed) perception of the ideas represented or just the user's reaction to the presentation of them alone would be part of that precision. I think the "no easy way to respond to a reaction" is an important point. Maybe there should be a way to respond to a reaction! I used the "Seems Borderline" reaction partly because I think things along these lines seem likely useful despite me agreeing that the reactions probably should be focused on subjective opinions, and partly just to be funny since you are objecting to that reaction. Someone downvote my "seems borderline" reaction to test downvoting of reactions!
I'm not entirely sure I agree but I think "what if reacts were only mapped to internal-action-reactions is an interesting prompt."  I do think I still want "seems false" or "seems unlikely" – they're important facets of my reaction to a thing, but the "seems" part feels important. I do generally feel, looking at the reacts, that they're slightly the-wrong-type-signature. i.e. I never feel an impulse to say "virtue of scholarship" but I might say "nice scholarship!" or something.

Neat. Looking at the list of reactions, one jumps out to me as out of place: the wrong reaction. The others reflect various feelings or perceptions and can be interpreted that way, but the wrong one seems too strong to me and overlaps with the existing agree/disagree voting. If you think something is wrong and want more than the disagree vote, seems like that's a case where we want to incentivize posting a reply rather than just leaving a wrong react with no explanation.

One thing I wanna flag is that I suspect, if the react experiment seemed worthwhile and the kinks were ironed out, I'd probably want to replace agree/disagree karma with some flavor of agree/disagree reacts (mostly to minimize overall site complexity) One thing that gives me some pause there is that I (mostly) think reacts make more sense if they have names attached, so that they are more like little microcomments for when you're too lazy to comment but want to react somehow, but I think there was something good about agree/disagree not having names attached. 
2Gordon Seidoh Worley11d
Also while I'm leaving feedback, I think there's too much nuance/overlap between some of the reactions. I think I'd prefer a smaller set that was something like: * insightful * confusing * i like this * i love this * i disagree and am maybe angry about it * this post is offputting in some other way that violates norms

Basically my theory is that reactions should be clearly personal reactions and stuff that can't be objected to (e.g. I can't object if you found my presentation overcomplicated, that's just how you felt about it), and anything that can be read as a bid to make claims should not be included because there's no easy way to respond to a reaction. I think on this grounds I also dislike the strawman and seems borderline reactions.

  • EA, for various cultural reasons, is a toxic brand in China. It's not any single component of EA, but rather the idea of altruism itself. Ask anyone who's lived in China for and few years and they will understand where I'm coming from. I think the best way forward for AI safety in China is to disassociate with EA. Rationality is more easily accepted, but spreading related ideas is not the most effective way to address AI safety in China.

I'd like to know more about this. What's the deal with altruism in China? Why is altruism disliked?

5Rudi C12d
E.S.: personal opinion Because proclaimed altruism is almost always not. In particular, SBF and the current EA push to religiously monopolize AI capability and research triggers a lot of red flags. There are even upvoted posts debating whether it’s “good” to publicize interpretability research. This screams cultist egoism to me. Asking others to be altruistic is also a non-cooperative action. You need to pay people directly not bully them to work because of the greater good. A society in which people aren’t allowed to have their self-interest as a priority is a society of slave bees. Altruism needs to be self-initiated and shown, not told.
There was some good discussion about this in the previous AI safety in China thread. [] 

If you haven't seen it you might find this book I'm writing interesting, especially chapter 7 which directly addresses pragmatism (chapter 6 does also but only at the conclusion).

I like that offers a clearer theory of what boundaries are than most things I've read on the subject. I often find the idea of boundaries weird not because I don't understand that sometimes people need to put up social defenses of various kinds to feel safe but because I've not seen a very crisp definition of boundaries that didn't produce a type error. Framing in terms of bids for greater connect hits at a lot of what I think folks care about when they talk about setting boundaries, so it makes a lot more sense to me now than my previous understanding, which was more like "I'm going to be emotionally closed here because I can't handle being open" which is still kind of true but mixes in a lot of stuff and so is not a crisp notion.

I really like this idea, since in an important sense these are accident risks: we don't intend for AI to cause existential catastrophe but it might if we make mistakes (and we make mistakes by default). I get why some folks in the safety space might not like this framing because accidents imply there's some safe default path and accidents are deviations from that when in fact "accidents" are the default thing AI do and we have to thread a narrow path to get good outcomes, but seems like a reasonable way to move the conversation forward with the general pub... (read more)

To the point about using NVC for positive things to, as a manager I try to keep something like this in mind when giving feedback to reports, both to signal where they need to improve and to let them know when they're doing well. I picked up the idea from reading books about how to parent and teach kids, but the same ideas apply.

The big thing, as I think of it, is to avoid making fundamental attribution errors, or as I really think of it, don't treat your observations of behavior patterns as essential characteristics of a person. Both negative and positive ... (read more)

If the mind becomes much more capable than the surrounding minds, it does so by being on a trajectory of creativity: something about the mind implies that it generates understanding that is novel to the mind and its environment.


I don't really understand this claim enough to evaluate it. Can you expand a bit on what you mean by it? I'm unsure about the rest of the post because it's unclear to me what the premise your top-line claim rest upon means.

If a mind comes to understand a bunch of stuff, there's probably some compact reasons that it came to understand a bunch of stuff. What could such reasons be? The mind might copy a bunch of understanding from other minds. But if the mind becomes much more capable than surrounding minds, that's not the reason, assuming that much greater capabilities required much more understanding. So it's some other reason. I'm describing this situation as the mind being on a trajectory of creativity.

I appreciate the sentiment but I find something odd about expecting ontology to be backwards compatible. Sometimes there are big, insightful updates that reshape ontology. Those are sometimes not compatible with the old ontology, except insofar as both were attempting to model approximately the same reality. As an example, at some point in the past I thought of people has having character traits, now I think of character traits as patters I extract from observed behavior and not something the person has. The new ontology doesn't seem backwards compatible to me, except that it's describing the same reality.

There's a LOT of detail that the word "compatible" obscures.  Obviously, they're not identical, so they must differ in some ways.  This will always and intentionally make them incompatible on some dimensions.  "compatible for what purpose" is the key question here. I'd argue that your character-traits example is very illustrative of this.  To the extent that you use the same clustering of trait definitions, that's very compatible for many predictions of someone's behavior.  Because the traits are attached differently in your model, that's probably NOT compatible for how traits change over time.  There are probably semi-compatible elements in there, as well, such as how you picture uncertainty about or correlation among different trait-clusters.

This seems like a generalization of something that humans are also guilty of. The way we win against other animals also can look kind of dumb from the perspective of those animals.

Suppose you're a cheetah. The elegant, smart way to take down pray is chase them down in a rapid sprint. The best takedowns are ones where you artfully outmaneuver your prey and catch them right at a moment when they think they are successfully evading you.

Meanwhile you look on humans with disdain. They can take down the same prey as you, but they do it in dumb ways. Sometimes th... (read more)

I'm suspicious of my theory for the same reason.

My own guess here is that access to capital will become more important than it is today by an order of magnitude.

In the forager era capital barely mattered because almost value was created via labor. With no way to reliable accumulate capital, there was little opportunity to exploit it.

In the farmer era, capital became much more important, mainly in the form of useful land, but labor remained of paramount importance for generating value. If anything, capital made labor more valuable and thus demanded more of it.

In the in industrial era, capital became more... (read more)

So I have a vague theory as to why this might work. It's kind of nuts, but given that the product works for you at all is kind of nuts, so here we are.

During meditation many people will start to spontaneously rhythmically rock back and forth or side to side as they enter jhana states. This generally coincides with greater feelings of joy, contentment, and focus. I'm not sure why this happens, but I've seen lots of people do it and I do it myself. My best guess is this has something to do with brain wave harmonics.

My guess is that the vibrations of this dev... (read more)

6Matt Goldenberg1mo
This is the theory behind a lot of entrainment tech, but the entrainment hypothesis always seems to disappear when subjected to more data. But for some reason it still feels intuitively like there should be something there.
My guess is that the alternation unsticks the nervous system to gently chill SNS activation, but am suspicious in my belief that it just happens to match a mechanism I already know about and like.
This is where my mind went first, too. I have no evidence either and haven't tried it myself, but I wouldn't be too surprised if there were simple ways rhythmic stimuli could induce brain states. The right kinds of music can help induce sleep or meditative states or hypnosis, so why not vibration?

Seems unlikely, both because I doubt the premise that an RO, whatever it looked like, would be significantly more or less trainable than IQ measurements (based on the fact that supposed measures of learnable knowledge like the SAT and GRE are so strongly correlated with IQ) and because if it had any measurement power it would, like the SAT and GRE, quickly become embroiled in politics due to disparities in outcomes between individuals and groups.

Related thought: because intelligence is often a proxy for status, calling someone or something dumb implies low status. This is why, for example, I think people get really worked up about IQ: being smart is not a matter of simple fact, but a matter of status assignment. As a society we effectively punish people for being dumb, and so naturally the 50% of the population that's below the mean has a strong incentive to fight back if you try to make explicit a way in which they may be treated as having lower status. Heck, it's worse than that: if you're not i... (read more)

So if a Rationality Quotient (RQ) became famous for only measuring skills that everyone can build regardless of where they start, rather than innate ability, it'd be less infected than the discourse around IQ?
2Adam Zerner1mo
I like how you phrased that and agree that descriptively, this is what seems to be going on.

As a manager (and sometimes middle manager) I've been thinking about how LLMs are going to change management. Not exactly your topic but close enough. Here's my raw thoughts so far:

  • the main reasons you need management layers is coordination
    • one person can't directly manage more than about 12 people effectively
    • past that point you need someone to hide some details for you
    • with ai though one director can manage more managers
    • normally coordination costs make it hard to manage more than 6 managers
    • llms may allow a director to manage 12 managers/teams by automating
... (read more)
Very interesting points, if I was still in middle management these things would be keeping me up at night! One point I query is "this is a totally new thing no manager has done before, but we're going to have to figure it out" -- is it that different from the various types of tool introduction & distribution / training / coaching that managers already do? I've spent a good amount of my career coaching my teams on how to be more productive using tools, running team show-and-tells from productive team members on why they're productive, sending team members on paid training courses, designing rules around use of internal tools like Slack/Git/issue trackers/intranets etc... and it doesn't seem that different to figuring out how to deploy LLM tools to a team. But I'm rusty as a manager, and I don't know what future LLM-style tools will look like, so I could be thinking about this incorrectly. Certainly if I had a software team right now, I'd be encouraging them to use existing tools like LLM code completion, automated test writing, proof-reading etc., and encouraging early adopters to share their successes & failures with such tools. Does "no manager has done before" refer to specific LLM tools, and is there something fundamentally different about them compared to past new technologies/languages/IDEs etc?

To expand a bit, I think this post is confusing ontological and ontic existence, or in LW terms mixing up existence in the map and existence in the territory.

1Mergimio H. Doefevmil1mo
It is my impression that certain people think that illusionists deny that there is any 🟩 even in the map, and I have never heard any illusionist make that argument (maybe I just haven’t been paying enough attention though). The conversation seems to be getting stuck somewhere at the level of misunderstandings concerning labels and referents. The key insight that I am trying to communicate here is that when we say that A is B, we generally do not mean that A is strictly identical to B - which it clearly isn’t. This applies even when we say things like 2+2 = 4. Obviously, "2+2" and "4" are not even close to being identical. Everyone understands this, and everyone understands that when we say that 2+2 = 4, we use two different sets of symbols to refer to one single mathematical object.  Claiming that greenness is activity in the visual cortex does not amount to denying that there is 🟩.    But again, perhaps I just misunderstand illusionism (although not even Keith Frankish himself would deny that there is 🟩, see the video linked in the post). Are there any illusionists around here who are claiming that 🟩 is not? As a side note, perhaps I will stop using the verb "to exist" altogether, and instead start using "to be".
Yes, and I think it is worse than that. Even existence in the map is not clearcut. As I said in the other comment, do dragons exist in the map? In what sense? Do they also exist in the territory, given that you can go and buy a figurine of one?

I don't think it makes sense to say that the symbol grounding problem has gone away, but I think it does make sense to say that we were wrong about what problems couldn't be solved without first solving symbol grounding. I also don't think we're really that confused about how symbols are grounded(1, 2, 3), although we don't yet have a clear demonstration of a system that has grounded its symbols in reality. GPTs do seem to be grounded by proxy through their training data, but this gives limited amounts of grounded reasoning today, as you note.

Thanks. My title was a bit tongue in cheek 'Betteridge's law' so yes I agree. I have decided to reply to your post before I have read your references as that may take a while to digest but I plan to do so. I also see you have written stuff on P-Zombies that I was going to write something on. As a relative newcomer its always a balance between just saying something and attempting to read and digest everything about it on LW first.

My guess would be that it'll be on the level of evals done internally by these companies today to make sure generative AI models don't say racist things or hand out bomb making instructions, etc.

Unclear to me how "serious" this really is. The US government has its hands in lots of things and spends money on lots of stuff. It's more serious than it was before, but to me this seems pretty close to the least they could be doing and not be seen as ignoring AI in ways that would be used against them in the next election cycle.

9Nathan Helm-Burger1mo
Here's the details on the NSF institutes...  Sound mostly irrelevant to AInotkilleveryoneism. Some seem likely to produce minor good things for the world, like perhaps the education and agriculture focused programs. Others seem potentially harmfully accelerationist, like the Neural & Cognitive science program. Cybersecurity might be good, we certainly could do with better cybersecurity. The Trustworthy AI just sounds like Social Justice AI concerns not relevant to AInotkilleveryoneism. Trustworthy AI NSF Institute for Trustworthy AI in Law & Society (TRAILS) [] Led by the University of Maryland, TRAILS aims to transform the practice of AI from one driven primarily by technological innovation to one driven with attention to ethics, human rights, and support for communities whose voices have been marginalized into mainstream AI. TRAILS will be the first Institute of its kind to integrate participatory design, technology, and governance of AI systems and technologies and will focus on investigating what trust in AI looks like, whether current technical solutions for AI can be trusted, and which policy models can effectively sustain AI trustworthiness. TRAILS is funded by a partnership between NSF and NIST. Intelligent Agents for Next-Generation Cybersecurity AI Institute for Agent-based Cyber Threat Intelligence and Operation (ACTION) [] Led by the University of California, Santa Barbara, this Institute will develop novel approaches that leverage AI to anticipate and take corrective actions against cyberthreats that target the security and privacy of computer networks and their users. The team of researchers will work with experts in security operations to develop a revolutionary approach to cybersecurity, in which AI-enabled intelligent security agents cooperate with humans across the cyber-defense life cyc
What would be a reasonable standard of action by you? Genuinely asking

Appears to be a duplicate of

to answer my own question:

Level of AI risk concern: high

General level of risk tolerance in everyday life: low

Brief summary of what you do in AI: first tried to formalize what alignment would mean, this led me to work on a program of deconfusing human values that reached an end of what i could do, now have moved on to writing about epistemology that i think is critical to understand if we want to get alignment right

Anything weird about you: prone to anxiety, previously dealt with OCD, mostly cured it with meditation but still pops up sometimes

A form is not just a form. I have to also follow up to make sense of the responses, report back findings, etc. Possibly worth exploring if seems like there might be something there but not the effort I want to put in now. Answering here I can ignore this and others can still benefit if I do nothing else with the idea.

If I learn enough this way to suggest it's worth exploring and doing a real study, sure. This is a case of better done lazily to get some information than not done at all.

This took like literally two minutes to make: google form [] Feel free to copy, edit, and distribute for respondents as you see fit. I do think this thing is worth just having in a google form format.

I think I disagree. Based on your presentation here, I think someone following a policy inspired by this post would be more likely to cause existential catastrophe by pursuing a promising false positive that actually destroys all future value in our Hubble volume. I've argued we need to focus on minimizing false positive risk rather than optimizing for max expected value, which is what I read this as proposing we do.

Thanks for your comment. I introduce the relative/absolute split in notions of truth in a previous chapter, so I expect readers of this chapter, as they progress through the book, to understand what it means.

I don't think this is a good idea. You and others may reasonably disagree, but here's my thinking:

Protests create an us vs. them mentality. Two groups are pitted against each other, with the protestors typically cast in the role of victims who are demanding to be heard.

I don't see this achieving the ends we need. If people push OpenAI to be for or against AI development, they are going to be for development. A protest, as I see it, risks making them dig in to a position and be less open to cooperating on safety efforts.

I'd rather see continued behind the s... (read more)

Dilemma ("choose a side") is a principle of non-violent direct action; why is an us vs. them mentality necessarily a bad thing? Do you oppose protest in principle? Would you say this about the climate movement pressuring fossil fuel companies to transition away from fossil fuels? I think we need both – here's evidence [] for the radical flank effect []. Strongly agree – and this is how all mass movements start, no?

What? Nothing is in conflict you just took quotes out of context. The full sentence where Zvi commits is:

So unless circumstances change substantially – either Yann changes his views substantially, or Yann’s actions become far more important – I commit to not covering him further.

If he does comment further, it's likely to be more of the same.
To me responding further does not necesarily imply a change in position or importance, so I still think the sentences are somewhat contradictory in hypothetical futures where Yann responds but does not substantially change his position or become more important. I think the resolution is that Zvi will update only this conversation with further responses, but will not cover other conversations (unless one of the mentioned conditions is met).

I'm doubtful of some of your examples of convergence, but despite that I think you are close to presenting an understanding why things like acausal trade might work.

Which examples? Convergence is a pretty well understood phenomenon in evolutionary theory and biology more broadly these days. Anything outside of our biosphere will likely follow the same trends and not just for the biological reasons given, but for the game theoretical ones too. Acausal trade seems unrelated since what I'm talking about is not a matter of predicting what some party might want/need in a narrow sense, but rather the broad sense that it is preferable to cooperate when direct contact is made. As a tangent, acausal trade is named poorly, since there is a clear causal chain involved. I wish they called it remote reasoning or distant games or something else.

Even if every one of your object level objections is likely to be right, this wouldn't shift me much in terms of policies I think we should pursue because the downside risks from TAI are astronomically large even at small probabilities (unless you discount all future and non-human life to 0). I see Eliezer as making arguments about the worst ways things could go wrong and why it's not guaranteed that they won't go that way. We could get lucky, but we shouldn't count on luck, so even if Eliezer is wrong he's wrong in ways that, if we adopt policies that acc... (read more)

Eliezer believes and argues that things go wrong by default, with no way he sees to avoid that. Not just "no guarantee they won't go wrong". It may be that his arguments are sufficient to convince you of "no guarantee they won't go wrong" but not to convince you of "they go wrong by default, no apparent way to avoid that". But that's not what he's arguing.

I am reasonably sympathetic to this argument, and I agree that the difference between EY's p(doom) > 50% and my p(doom) of perhaps 5% to 10% doesn't obviously cash out into major policy differences.

I of course fully agree with EY/bostrom/others that AI is the dominant risk, we should be appropriately cautious, etc. This is more about why I find EY's specific classic doom argument to be uncompelling.

My own doom scenario is somewhat different and more subtle, but mostly beyond scope of this (fairly quick) summary essay.

This doesn't seem especially "global" to me then. Maybe another term would be better? Maybe this is a proximate/ultimate distinction?

Currently using "task specific"/"total".
Hmm, the etymology was that I was using "local optimisation" to refer to the kind of task specific optimisation humans do. And global was the natural term to refer to the kind of optimisation I was claiming humans don't do but which an expected utility maximiser does.

I'm suspicious of your premise that evolution or anything is doing true global optimization. If the frame is the whole universe, all optimization is local optimization because of things like the speed of light limiting how fast information can propagate. Even if you restrict yourself to a Hubble volume this would still be the case. In essence, I'd argue all optimization is local optimization.

The "global" here means that all actions/outputs are optimising towards the same fixed goal(s):

Appreciate you thinking about this question, but I also downvoted the post. Why? This is the kind of low effort, low context post that I don't want to see a lot of on LessWrong.

I think a good version of this question would have been if you presented some more context rather than just rehashing a quite old and well-worn topic without adding anything new. For example, if there's some new information specifically leading you to rethink the standard takes on this question making it worth reconsideration.

I don't have especially strong opinions about what to do here. But, for the curious, I've had run ins with both Said and Duncan on LW and elsewhere, so perhaps this is useful background information to folks outside the moderation team look at this who aren't already aware (I know they are aware of basically everything I have to say here because I've talked to some of them about these situations).

Also, before I say anything else, I've not had extensive bad interactions with either Said or Duncan recently. Maybe that's because I've been writing a book instea... (read more)

Sorry for the lack of links above.

I affirm the accuracy of Gordon's summary of our interactions; it feels fair and like a reasonable view on them.

2Said Achmiz2mo
To clarify—you’re not including me in the “and elsewhere” part, are you? (To my knowledge, I’ve only ever interacted with you on Less Wrong. Is there something else that I’m forgetting…?)

For what it's worth, I think you're unusually uncomfortable with people doing this. I've not read the specific thread you're referring to, but I recall you expressing especially and unusually high dislike for other performing analysis of your mind and motivations.

I'm not sure what to do with this, only that I think it's important background for folks trying to understand the situation. Most people dislike psychoanalyzing to some extent, but you seem like P99 in your degree of dislike. And, yes, I realize that annoyingly my comment is treading in the direction of analyzing you, but I'm trying to keep it just to external observations.

2[DEACTIVATED] Duncan Sabien2mo
(I also note that I'm a bit sensitive after that one time that a prominent Jewish member of the rationality community falsely accused me on LW of wanting to ghettoize people and the comment was highly upvoted and there was no mod response for over a week. I imagine it's rather easier for other people to forget the impact of that than for me to.)
1[DEACTIVATED] Duncan Sabien2mo
I don't mind people forming hypotheses, as long as they flag that what they've got is a hypothesis. I don't like it when people sort of ... skim social points off the top?  They launch a social attack with the veneer that it's just a hypothesis ("What, am I not allowed to have models and guesses?"), but (as gjm pointed out in that subthread) they don't actually respond to information, and update.  Observably, Said was pretending to represent a reasonable prior, but then refusing to move to a posterior. That's the part I don't like.  If you gamble with someone else's reputation and prove wrong, you should lose, somehow; it shouldn't be a free action to insinuate negative things about another person and just walk away scot-free if they were all false and unjustified.

I think this seems worth digging into. I've done my own digging, though not spent a lot of time thinking in detail about how it generalizes to minds unlike ours, although I think there's some general structure here that should generalize.

Hard agree. I think there's a tendency among folks to let fears of doom eat their minds in ways that make them give up without even trying. Some people give up outright. Others think they're trying to avert doom, but they've actually given up and are just trying because they're anxious and don't know what else to do, but they don't expect their attempt to work so they only make a show of it, not doing things they could do that stand a real chance of reducing the probability of doom.

I like your analogy to video games. I play DoTA (and have for a long time; ... (read more)

My comment will be vague because I'm not sure how much permission I have to share this or if it's been publicly said somewhere and I'm just unaware, but I talked to an AI researcher at one of the major companies/labs working on things like LLMs several years ago, before even GPT-1 was out, and they told me that your reason 10 was basically their whole reason for wanting to work on language models.

1Seth Herd2mo
Good for them! I'm really happy some of them saw this coming. To my embarrassment, neither I nor anyone else I know in the community saw this coming. I did see self-talk wrappers for LLMs as a way to give them agency; I haven't said anything since it could be an infohazard for capabilities. But I didn't notice how easy that would make initial alignment, or I would've been shouting about it. I'm sure some people have thought of this, and my hat is off to all of them. To be clear, this doesn't make all of alignment easy. As I say in Point 10. But I think it drastically improves our odds.

I can see you're taking a realist stance here. Let me see if I can take a different route that makes sense in terms of realism.

Let's suppose there are moral facts and some norms are true while others are false. An intelligent AI can then determine which norms are true. Great!

Now we still have a problem, though: our AI hasn't been programmed to follow true norms, only to discover them. Someone forgot to program that bit in. So now it knows what's true, but it's still going around doing bad things because no one made it care about following true norms.

This i... (read more)

IOW, moral norms being intrinsically motivating is a premise beyond them being objectively true.
Load More