David Udell

Sequences

Insights from Dath Ilan
Winding My Way Through Alignment

Wiki Contributions

Comments

How to Visualize Bayesianism

They seem hard(er) to visualize moment to moment with respect to updates, imo. One of the neat things about lines is that the renormalization step of an observational update isn't cognitively demanding, whereas translating areas of disks-with-bites-taken-out-of-them-by-an-inconsistent-observation into new disks with those same areas is cognitively demanding.

But different Bayesian visualizations will work best for different people.

David Udell's Shortform

Minor spoilers for mad investor chaos (Book 1) and the dath-ilani-verse generally.

When people write novels about aliens attacking dath ilan and trying to kill all humans everywhere, the most common rationale for why they'd do that is that they want our resources and don't otherwise care who's using them, but, if you want the aliens to have a sympathetic reason, the most common reason is that they're worried a human might break an oath again at some point, or spawn the kind of society that betrays the alien hypercivilization in the future.

--Eliezer, mad investor chaos

David Udell's Shortform

Spoilers for mad investor chaos (Book 2).

"Basic project management principles, an angry rant by Keltham of dath ilan, section one:  How to have anybody having responsibility for anything."

Keltham will now, striding back and forth and rather widely gesturing, hold forth upon the central principle of all dath ilani project management, the ability to identify who is responsible for something.  If there is not one person responsible for something, it means nobody is responsible for it.  This is the proverb of dath ilani management.  Are three people responsible for something?  Maybe all three think somebody else was supposed to actually do it.

In companies large enough that they need regulations, every regulation has an owner.  There is one person who is responsible for that regulation and who supposedly thinks it is a good idea and who could nope the regulation if it stopped making sense.  If there's somebody who says, 'Well, I couldn't do the obviously correct thing there, the regulation said otherwise', then, if that's actually true, you can identify the one single person who owned that regulation and they are responsible for the output.

Sane people writing rules like those, for whose effects they can be held accountable, write the ability for the person being regulated to throw an exception which gets caught by an exception handler if a regulation's output seems to obviously not make sane sense over a particular event.  Any time somebody has to literally break the rules to do a saner thing, that represents an absolute failure of organizational design.  There should be explicit exceptions built in and procedures for them.

Exceptions, being explicit, get logged.  They get reviewed.  If all your bureaucrats are repeatedly marking that a particular rule seems to be producing nonsensical decisions, it gets noticed.  The one single identifiable person who has ownership for that rule gets notified, because they have eyes on that, and then they have the ability to optimize over it, like by modifying that rule.  If they can't modify the rule, they don't have ownership of it and somebody else is the real owner and this person is one of their subordinates whose job it is to serve as the other person's eyes on the rule. 

Cheliax's problem is that the question 'Well who's responsible then?' stopped without producing any answer at all.

This literally never happens in a correctly designed organization.  If you have absolutely no other idea of who is responsible, then the answer is that it is the job of Abrogail Thrune.  If you do not want to take the issue to Abrogail Thrune, that means it gets taken to somebody else, who then has the authority to make that decision, the knowledge to make that decision, the eyes to see the information necessary for it, and the power to carry out that decision.

Cheliax should have rehearsed this sort of thing by holding an Annual Nidal Invasion Rehearsal Festival, even if only Governance can afford to celebrate that festival and most tiny villages can't.  During this Festival, the number of uncaught messages getting routed to Abrogail Thrune, would then have informed the Queen that there would be a predictable failure of organizational design in the event of large-scale catastrophe, in advance of that catastrophe actually occurring.

If literally everybody with the knowledge to make a decision is dead, it gets routed to somebody who has to make a decision using insufficient knowledge.

If a decision can be delayed … then that decision can be routed to some smarter or more knowledgeable person who will make the decision later, after they get resurrected.  But, like, even in a case like that, there should be one single identifiable person whose job it would be to notice if the decision suddenly turned urgent and grab it out of the delay queue.

--Eliezer Yudkowsky, mad investor chaos

AGI Ruin: A List of Lethalities

(I meant: What fields can we draw legible geniuses from, into alignment.)

AGI Ruin: A List of Lethalities

"Geniuses" with nice legible accomplishments in fields with tight feedback loops where it's easy to determine which results are good or bad right away, and so validate that this person is a genius, are (a) people who might not be able to do equally great work away from tight feedback loops, (b) people who chose a field where their genius would be nicely legible even if that maybe wasn't the place where humanity most needed a genius, and (c) probably don't have the mysterious gears simply because they're rare.  You cannot just pay $5 million apiece to a bunch of legible geniuses from other fields and expect to get great alignment work out of them.  They probably do not know where the real difficulties are, they probably do not understand what needs to be done, they cannot tell the difference between good and bad work, and the funders also can't tell without me standing over their shoulders evaluating everything, which I do not have the physical stamina to do.  I concede that real high-powered talents, especially if they're still in their 20s, genuinely interested, and have done their reading, are people who, yeah, fine, have higher probabilities of making core contributions than a random bloke off the street. But I'd have more hope - not significant hope, but more hope - in separating the concerns of (a) credibly promising to pay big money retrospectively for good work to anyone who produces it, and (b) venturing prospective payments to somebody who is predicted to maybe produce good work later.


What fields would qualify as "lacking tight feedback loops"? Computer security? Why don't, e.g., credentialed math geniuses leading their subfields qualify -- because math academia is already pretty organized and inventing a new subfield of math (or whatever) is just not in the same reference class of feat as Newton inventing mathematical physics from scratch?

(c) probably still holds even if there exists a promising class of legible geniuses, though.

Rationalism in an Age of Egregores

"You are intransigently trying to ignore the fact that what you are doing will have effects beyond those you're admitting to, and has different consequences for different groups (and usually just happens to give a relative advantage to you or to some group you're in)".

I don't think this is a very different claim from

"there are no interesting, purely predictive uses of language that avoid signaling in some group-status relevant manner; anyone claiming so is actually making a status move by trying to arrogate the title of 'neutral.'"


And even simple flat-out facts also have consequences for political disputes. Almost any fact is politically convenient or inconvenient for somebody, if for no other reason than that people have sometimes actually picked their political positions because of those facts. Somebody's choosing to assign salience to one set of facts, while ignoring another set of facts that are equally important, is almost always political. And people do do that while claiming neutrality.

There might be two senses of "political" that you're moving between here. The first is "dealing with political topics at all," and the second is "will redistribute status in the group-status game." I readily admit that simple apparently apolitical facts can have political-affairs-relevant implications, and lead people to update their views on political affairs.

But it's this bit that I'm worried about:

Somebody's choosing to assign salience to one set of facts, while ignoring another set of facts that are equally important, is almost always political. And people do do that while claiming neutrality.

People can do this, when they're in the trenches jockeying for status. But there's also such a thing as credibly signaling your apolitical status, and not selectively filtering evidence based on its implications for your egregore. We should try hard to do more of the latter, and opt out of the egregore-status game.

David Udell's Shortform

I agree that rationalism involves the (advanced rationalist) skills of instrumentally routing through relevant political challenges to accomplish your goals … but I'm not sure any of those proposed labels captures that well.

I like "apolitical" because it unequivocally states that you're not trying to slogan-monger for a political tribe, and are naively, completely, loudly, and explicitly opting out of that status competition and not secretly fighting for the semantic high-ground in some underhanded way (which is more typical political behavior, and is thus expected). "Meritocratic," "humanist," "humanitarian," and maybe "open-minded" are all shot for that purpose, as they've been abused by political tribes in the ongoing culture war (and in previous culture wars, too; our era probably isn't too special in this regard) and connotate allegiance to some political tribes over others.

What I really want is an adjective that says "I'm completely tapping out of that game."

David Udell's Shortform

A decent handle for rationalism is 'apolitical consequentialism.'

'Apolitical' here means avoiding playing the whole status game of signaling fealty to a political tribe and winning/losing status as that political tribe wins/loses status competitions. 'Consequentialism' means getting more of what you want, whatever that is.

David Udell's Shortform

Become consequentialist enough, and it'll wrap back around to being a bit deontological.

[$20K in Prizes] AI Safety Arguments Competition

Machine learning researchers know how to keep making AI smarter, but have no idea where to begin with making AI loyal.

Load More