nowl — LessWrong

robot altruist powered by love

If you spot a logical error in my thinking, then please point it out. But short of that, among mostly-rational people, I think most disagreements come down to a difference of intuitions, which are rooted in a difference in the data people have been exposed to

I think humans also have some natural variance in how they form intuitions in response to the same evidence. speculation: In evolution, if everyone did that the same, there'd be correlated failures in critical situations (though also correlated successes), and in general people would correlatedly try the same things; I think this sort of thing is why human minds vary so much.

I think I sense or get a vibe of another's intuition-forming after enough communication from them, though it's hard for me to be sure that I can really detect this. And I'm not sure how to well-describe an example of what I think I notice.

Now, there might be ways to design systems to avoid this problem, eg fixing certain beliefs and values such that they don't update on evidence

Because of the is-ought gap, value (how one wants the world to be) doesn't inherently change in response to evidence (how the world is).^[1]

So a hypothetical competent AI designer^[2] doesn't have to go out of their way to make the value not update on evidence. Nor to make any beliefs not update on evidence.

(If an AI is more like a human then [what it acts like it values] could change in response to evidence though yea. I think most of the historical alignment theory texts aren't about aligning human-like AIs (but rather hypothetical competently designed ones).)

^{^}
I've had someone keep disagreeing with this once, so I'll add that a value is not a statement about the world, so how would the Bayes equation update it?
a hypothetical competently designed AI could separately have a belief about "What I value", or more specifically, about "the world contains something here running code for the decision process that is me, so its behavior correlates with my decision", but regardless of how that belief gets manipulated by the hypothetical evidence-presenting demon (maybe it's manipulated into "with high probability, the thing runs code that values y instead, and its actions don't correlate with my decision"), the next step in the AI goes: "given all these beliefs, what output of the-decision-process-that-is-me best fulfills <hardcoded value function>".
(if it believes there is nothing in the world whose behavior correlates with the decision, all decisions would do nothing and score equally in such case. it'd default to acting under world-possibilities which it assigns lower probability but where it has machines to control).
(one might ask) okay but could the hypothetical demon manipulate its platonic beliefs about what "the decision process that is me" is? well, maybe not, because that's (as separate from the above) also not the kind of thing that inherently updates on evidence about a world.
but if it were manipulated, somehow - im not quite sure what to even imagine being manipulated, maybe parts of the process rely on 'expectations' about other parts so it's those expectations (though only if they're not hardcoded in? so some sort of AI designed to discover some parts of 'what it is' by observing its own behavior?) - there'd still be code at some point saying to [score considered decisions on how much they fulfill <hardcoded value function>, and output the highest scoring one]. it's just parts of the process could be confused(?)/hijacked, in this hypothetical.
^{^}
(not grower)

It's actually 150% :

it's fair to say that I wrote 300% of this book! Nate then wrote the other 150%! The combined material was ruthlessly cut down, by Nate, and either rewritten or replaced by Nate. I couldn't possibly write anything this short, and I don't expect it to read like standard eliezerfare. (Except maybe in the parables that open most chapters.)

(So it sounds like they both wrote new text, then Nate cut it down)

I thought it was good because it made me ruminate on how close the real world is to being driven by these sorts of characters.

I don't know what updates to make from these studies, because:

Idk if the negative effects they found would be prevented by supplements/blood tests.
Idk if there were selection effects on which studies end up here. I know one could list studies for either conclusion ("eating animals is more/less healthy than not"), as is true of many topics.

What process determined the study list?

Most people would consider sacrificing their health for others to be too demanding an ethical framework.

(This comment is local to the quote, not about the post's main arguments) Most people implicitly care about the action/inaction distinction. They think "sacrificing to help others" is good but in most cases non-obligatory. They think "proactively hurting others for own benefit" is bad, even if it'd be easier.

Killing someone for their body is a case of harming for own gain. The quote treats it as just not making a sacrifice.

I think it does feel to many that not-killing animals is proactive helping, and not-not-killing animals is inaction, because the default is to kill them (and it's abstracted away so actually one is only paying someone else to kill them and it's never presented to one as this, and so on). And that's part of why animal-eating is commonly accepted (though the core reason is usually thinking animals are not all that morally relevant).

But in the end "proactively helping others others is nonobligatory" wouldn't imply "not-killing animals is nonobligatory".

have they tested that?

they say in a reply that they think the difference they notice is caused by folate (aka vitamin B9) deficiency, but many vegans take supplements and get blood tests to avoid having any deficiencies in the main nutrients. so maybe they're only noticing vegans who don't take supplements (that contain b9).

So—was it worth backing a candidate?
Yes. That path ultimately led me to a place of greater agency and purpose.

Do you know how the others were effected? Have you kept in contact?

Sounds like some of them were hurt? You say you forced them to take action, and that afterwards they "shifted to 'self-care.' "

If you newly preorder MIRI's book because of this comment, I'll donate the cost of the book (plus taxes/shipping) to your preferred charity.^[1]

(I'll do this for up to 50 people for now. 1/50 so far.)

^{^}
The cost varies with book format, so reply or message me with: total cost, your preferred charity.

That observation makes a narrow window valuable: whatever we want preserved must be radiated now, redundantly, and in a form that minds unlike ours can decode with high probability. The useful direction is not “build a grand vault” but “broadcast a self-bootstrapping curriculum until the last switch flips.”

A way to do this without needing to create a formal ("universal language") alignment curriculum would be to just broadcast a lot of internet data, somehow emphasizing dictionaries (easier to first interpret) and LessWrong text. One way to emphasize would be to send them more times. Maybe include some formalisms that try to indicate the concept of language.

In case we're already able to radiate arbitrary bitsequences, there might not be any large hurdles to doing this.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments