All of David Udell's Comments + Replies

I think that many (not all) of your above examples boil down to optimizing for legibility rather than optimizing for goodness. People who hobnob instead of working quietly will get along with their bosses better than their quieter counterparts, yes. But a company of brown nosers will be less productive than a competitor company of quiet hardworking employees! So there's a cooperate/defect-dilemma here.

What that suggests, I think, is that you generally shouldn't immediately defect as hard as possible, with regard to optimizing for appearances. Play the prev... (read more)

Good point. I am concerned that adding even a dash of legibility screws the work over completely and immediately and invisibly rather than incrementally. I may have over-analyzed my data so I should probably return to the field to collect more samples.

The ML models that now speak English, and are rapidly growing in world-transformative capability, happen to be called transformers.

This is not a coincidence because nothing is a coincidence.

Two moments of growing in mathematical maturity I remember vividly:

  1. Realizing that equations are claims that are therefore either true or false. Everything asserted with symbols... could just as well be asserted in English. I could start chunking up arbitrarily long and complicated equations between the equals signs, because those equals signs were just the English word "is"!
  2. Learning about the objects that mathematical claims are about. Going from having to look up "Wait, what's a real number again?" to knowing how , and  interrelat
... (read more)

I found it distracting that all your examples were topical, anti-red-tribe coded events. That reminded me of

In Artificial Intelligence, and particularly in the domain of nonmonotonic reasoning, there’s a standard problem: “All Quakers are pacifists. All Republicans are not pacifists. Nixon is a Quaker and a Republican. Is Nixon a pacifist?”

What on Earth was the point of choosing this as an example? To rouse the political emotions of the readers and distract them from the main question? To make Republicans feel unwelcome in courses on Artificial Intelligenc

... (read more)
-3[DEACTIVATED] Duncan Sabien2mo
I don't subscribe to a stay-non-politicized discourse norm (often one tribe is Actually Being Worse, and I'm not going to handicap my ability to say true things, though it's worth doing so carefully), but I am also quite happy to edit to include anti-blue-tribe examples as you or others propose them. EDIT: Also, the linked FB post "get really mad about something stupid" is a blue tribe example, which I mention as evidence that I had previously written about the blue tribe being bad in this way all by itself. =)

2. The anchor of a major news network donates lots of money to organizations fighting against gay marriage, and in his spare time he writes editorials arguing that homosexuals are weakening the moral fabric of the country. The news network decides they disagree with this kind of behavior and fire the anchor.

a) This is acceptable; the news network is acting within their rights and according to their principles
b) This is outrageous; people should be judged on the quality of their work and not their political beliefs

12. The principal of a private school is a

... (read more)

Academic philosophers are better than average at evaluating object-level arguments for some claim. They don't seem to be very good at thinking about what rationalization in search implies about the arguments that come up. Compared to academic philosophers, rationalists strike me as especially appreciating filtered evidence and its significance to your world model.

If you find an argument for a claim easily, then even if that argument is strong, this (depending on some other things) implies that similarly strong arguments on the other side may turn up with n... (read more)

Modest spoilers for planecrash (Book 9 -- null action act II).

Nex and Geb had each INT 30 by the end of their mutual war.  They didn't solve the puzzle of Azlant's IOUN stones... partially because they did not find and prioritize enough diamonds to also gain Wisdom 27.  And partially because there is more to thinkoomph than Intelligence and Wisdom and Splendour, such as Golarion's spells readily do enhance; there is a spark to inventing notions like probability theory or computation or logical decision theory from scratch, that is not directly me

... (read more)

What sequence of characters could I possibly, actually type out into a computer that would appreciably reduce the probability that everything dies?

Framed like this, writing to save the world sounds impossibly hard! Almost everything written has no appreciable effect on our world's AI trajectory. I'm sure the "savior sequence" exists mathematically, but finding it is a whole different ballgame.

In the beginning God created four dimensions. They were all alike and indistinguishable from one another. And then God embedded atoms of energy (photons, leptons, etc.) in the four dimensions. By virtue of their energy, these atoms moved through the four dimensions at the speed of light, the only spacetime speed. Thus, as perceived by any one of these atoms, space contracted in, and only in, the direction of that particular atom's motion. As the atoms moved at the speed of light, space contracted so much in the direction of the atom's motion that the dimen

... (read more)

Past historical experience and brainstorming about human social orders probably barely scratches the possibility space. If the CEV were to weigh in on possible posthuman social orders,[1] optimizing in part for how cool that social order is, I'd bet what it describes blows what we've seen out of the water in terms of cool factor.

  1. ^

    (Presumably posthumans will end up reflectively endorsing interactions with one another of some description.)

Don't translate your values into just a loss function. Rather, translate them into a loss function and all the rest of a training story. Use all the tools at your disposal in your impossible task; don't tie one hand behind your back by assuming the loss function is your only lever over the AGI's learned values.

This post crystallized some thoughts that have been floating in my head, inchoate, since I read Zvi's stuff on slack and Valentine's "Here's the Exit."

Part of the reason that it's so hard to update on these 'creative slack' ideas is that we make deals among our momentary mindsets to work hard when it's work-time. (And when it's literally the end of the world at stake, it's always work-time.) "Being lazy" is our label for someone who hasn't established that internal deal between their varying mindsets, and so is flighty and hasn't precommitted to getting st... (read more)

+1. I've explained a less clear/expansive version of this post to a few people this last summer. I think there is often some internal value-violence going on when many people fixate on Impact.

A model I picked up from Eric Schwitzgebel.

The humanities used to be highest-status in the intellectual world!

But then, scientists quite visibly exploded fission weapons and put someone on the moon. It's easy to coordinate to ignore some unwelcome evidence, but not evidence that blatant. So, begrudgingly, science has been steadily accorded more and more status, from the postwar period on.

"Calling babble and prune the True Name of text generation is like calling bogosort the True Name of search."

In the 1920s when  and CL began, logicians did not automatically think of functions as sets of ordered pairs, with domain and range given, as mathematicians are trained to do today. Throughout mathematical history, right through to computer science, there has run another concept of function, less precise at first but strongly influential always; that of a function as an operation-process (in some sense) which may be applied to certain objects to produce other objects. Such a process can be defined by giving a set of rules describing how it acts

... (read more)
3Alexander Gietelink Oldenziel5mo
There is a third important aspect of functions-in-the-original-sense that distinguishes them from extensional functions (i.e. collection of input-output pairs): effects. Describing these 'intensional' features is an active area of research in theoretical CS. One important thread here is game semantics; you might like to take a look: []

Given a transformer model, it's probably possible to find a reasonably concise energy function (probably of a similar OOM of complexity as the model weights themselves) whose minimization corresponds to executing forwards passes of the transformer. However, this [highly compressive] energy function wouldn't tell you much about what the personas simulated by the model "want" or how agentic they were, since the energy function is expressed in the ontology of model weights and activations, not an agent's beliefs / goals. [This has] the type signature of a uti

... (read more)

This is a great theorem that's stuck around in my head this last year! It's presented clearly and engagingly, but more importantly, the ideas in this piece are suggestive of a broader agent foundations research direction. If you wanted to intimate that research direction with a single short post that additionally demonstrates something theoretically interesting in its own right, this might be the post you'd share.

This post has successfully stuck around in my mind for two years now! In particular, it's made me explicitly aware of the possibility of flinching away from observations because they're normie-tribe-coded.

I think I deny the evidence on most of the cases of dogs generating complex English claims. But it was epistemically healthy for that model anomaly to be rubbed in my face, rather than filter-bubbled away plus flinched away from and ignored.

This is a fantastic piece of economic reasoning applied to a not-flagged-as-economics puzzle! As the post says, a lot of its content is floating out there on the internet somewhere: the draw here is putting all those scattered insights together under their common theory of the firm and transaction costs framework. In doing so, it explicitly hooked up two parts of my world model that had previously remained separate, because they weren't obviously connected.

As I have now written in a below comment [], see my 2016 Ribbonfarm post [] about this.

Complex analysis is the study of functions of a complex variable, i.e., functions  where  and  lie in . Complex analysis is the good twin and real analysis the evil one: beautiful formulas and elegant theorems seem to blossom spontaneously in the complex domain, while toil and pathology rule in the reals. Nevertheless, complex analysis relies more on real analysis than the other way around.

--Pugh, Real Mathematical Analysis (p. 28)

One important idea I've picked up from reading Zvi is that, in communication, it's important to buy out the status cost imposed by your claims.

If you're fielding a theory of the world that, as a side effect, dunks on your interlocutor and diminishes their social status, you can work to get that person to think in terms of Bayesian epistemology and not decision theory if you make sure you aren't hurting their social image. You have to put in the unreasonable-feeling work of framing all your claims such that their social status is preserved or fairly increas... (read more)

Thanks -- right on both counts! Post amended.

An Inconsistent Simulated World

I regret to inform you, you are an em inside an inconsistent simulated world. By this, I mean: your world is a slapdash thing put together out of off-the-shelf assets in the near future (presumably right before a singularity eats that simulator Earth).

Your world doesn't bother emulating far-away events in great detail, and indeed, may be messing up even things you can closely observe. Your simulators are probably not tampering with your thoughts, though even that is something worth considering carefully.

What are the flaws you... (read more)

When another article of equal argumentative caliber could have just as easily been written for the negation of a claim, that writeup is no evidence for its claim.

Switching costs between different kinds of work can be significant. Give yourself permission to focus entirely on one kind of work per Schelling unit of time (per day), if that would help. Don't spend cognitive cycles feeling guilty about letting some projects sit on the backburner; the point is to get where you're going as quickly as possible, not to look like you're juggling a lot of projects at once.

This can be hard, because there's a conventional social expectation that you'll juggle a lot of projects simultaneously, maybe because that's more legible t... (read more)

A multiagent Extrapolated Volitionist institution is something that computes and optimizes for a Convergent Extrapolated Volition, if a CEV exists.

Really, though, the above Extrapolated Volitionist institutions do take other people into consideration. They either give everyone the Schelling weight of one vote in a moral parliament, or they take into consideration the epistemic credibility of other bettors as evinced by their staked wealth, or other things like that.

Sometimes the relevant interpersonal parameters can be varied, and the institutional designs... (read more)

Because your utility function is your utility function, the one true political ideology is clearly Extrapolated Volitionism.

Extrapolated Volitionist institutions are all characteristically "meta": they take as input what you currently want and then optimize for the outcomes a more epistemically idealized you would want, after more reflection and/or study.

Institutions that merely optimize for what you currently want the way you would with an idealized world-model are old hat by comparison!

Since when was politics about just one person?

Stress and time-to-burnout are resources to be juggled, like any other.

“What is the world trying to tell you?”

I've found that this prompt helps me think clearly about the evidence shed by the generator of my observations.

As Gauss stressed long ago, any kind of singular mathematics acquires a meaning only as a limiting form of some kind of well-behaved mathematics, and it is ambiguous until we specify exactly what limiting process we propose to use. In this sense, singular mathematics has necessarily a kind of anthropomorphic character; the question is not what is it, but rather how shall we define it so that it is in some way useful to us?

--E. T. Jaynes, Probability Theory (p. 108)

Bogus nondifferentiable functions

The case most often cited as an example of a nondifferentiable function is derived from a sequence , each of which is a string of isosceles right triangles whose hypotenuses lie on the real axis and have length . As , the triangles shrink to zero size. For any finite , the slope of  is  almost everywhere. Then what happens as ? The limit  is often cited carelessly as a nondifferentiable function. Now it is clear that the limit of the derivativ

... (read more)
2David Udell7mo

Epistemic status: politics, known mindkiller; not very serious or considered.

People seem to have a God-shaped hole in their psyche: just as people banded around religious tribal affiliations, they now, in the contemporary West, band together around political tribal affiliations. Intertribal conflict can be, at its worst, violent, on top of mindkilling. Religious persecution in the UK was one of the instigating causes of British settlers migrating to the American colonies; religious conflict in Europe generally was severe.

In the US, the 1st Amendment legall... (read more)

The explicit definition of an ordered pair  is frequently relegated to pathological set theory...

It is easy to locate the source of the mistrust and suspicion that many mathematicians feel toward the explicit definition of ordered pair given above. The trouble is not that there is anything wrong or anything missing; the relevant properties of the concept we have defined are all correct (that is, in accord with the demands of intuition) and all the correct properties are present. The trouble is that the concept has some irreleva

... (read more)
3Alexander Gietelink Oldenziel7mo
Modern type theory mostly solves this blemish of set theory and is highly economic conceptually to boot. Most of the adherence of set theory is historical inertia - though some aspects of coding & presentations is important. Future foundations will improve our understanding on this latter topic. 

Social niceties and professionalism act as a kind of 'communications handshake' in ordinary society -- maybe because they're still a credible correlate of having your act together enough to be worth considering your outputs in the first place?

Very cool! I have noticed that in arguments in ordinary academia people sometimes object that "that's so complicated" when I take a lot of deductive steps. I hadn't quite connected this with the idea that:

If you're confident in your assumptions ( is small), or if you're unconfident in your inferences ( is big), then you should penalise slow theories moreso than long theories, i.e. you should be a T-type.

I.e., that holding a T-type prior is adaptive when even your deductive inferences are noisy.

Also, I take it that this row of your table:

... (read more)
3Cleo Nardo7mo
yep. amended.

FWIW, this post strikes me as a very characteristically 'Hansonian' insight.

'Hansonian' meaning it explores a change that is a funny departure from a current equilibrium, but doesn't explain why it's only that change which is possible, rather than changing an underlying inefficiency or a different dimension of the equilibrium? Why have paid time off (or allowed time off) at all?  Why not bid for time off, with different multipliers depending on how many are out at once?  At the very least, why not measure the correlates of expense of time-off and adjust accordingly, rather than just the intuition that for some teams, one member missing has a disproportionate cost and that should be fixed by enforced scheduling.

Now, whatever  may assert, the fact that  can be deduced from the axioms cannot prove that there is no contradiction in them, since, if there were a contradiction,  could certainly be deduced from them!

This is the essence of the Gödel theorem, as it pertains to our problems. As noted by Fisher (1956), it shows us the intuitive reason why Gödel’s result is true. We do not suppose that any logician would accept Fisher’s simple argument as a proof of the full Gödel theorem; yet for most of us it is more convincing than Gödel’s

... (read more)
The text is slightly in error. It is straightforward to construct a program that is guaranteed to locate an inconsistency if one exists: just have it generate all theorems and stop when it finds an inconsistency. The problem is that it doesn't ever stop if there isn't an inconsistency. This is the difference between decidability and semi-decidability. All the systems covered by Gödel's completeness and incompletness theorems are semi-decidable, but not all are decidable.

I know that the humans forced to smile are not happy (and I know all the mistakes they've made while programming me, I know what they should've done instead), but I don't believe that they are not happy.

These are different senses of "happy." It should really read:

I know forcing humans to smile doesn't make them , and I know what they should've written instead to get me to optimize for  as they intended, but they are .

They're different concepts, so there's no strangeness here. The AGI knows what you meant... (read more)

1Q Home7mo
I know that there's no strangeness from the formal point of view. But it doesn't mean there's no strangeness in general. Or that the situation isn't similar to the Moore paradox. Your examples are not 100% Moore statements too. Isn't the point of the discussion to find interesting connections between Moore paradox and other things? I know that the classical way to formulate it is "AI knows, but doesn't care". I thought it may be interesting to formulate it as "AI knows, but doesn't believe". It may be interesting to think for what type of AI this formulation may be true. For such AI alignment would mean resolving the Moore paradox. For example, imagine an AI with a very strong OCD to make people smile.

The human brain does not start out as an efficient reasoning machine, plausible or deductive. This is something which we require years to learn, and a person who is an expert in one field of knowledge may do only rather poor plausible reasoning in another. What is happening in the brain during this learning process?

Education could be defined as the process of becoming aware of more and more propositions, and of more and more logical relationships between them. Then it seems natural to conjecture that a small child reasons on a lattice of very open structur

... (read more)

Yeah, fair -- I dunno. I do know that an incremental improvement on simulating a bunch of people in an environment philosophizing is doing that but running an algorithm that prevents coercion, e.g.

I imagine that the complete theory of these incremental improvements (for example, also not running a bunch of moral patients for many subjective years while computing the CEV), is the final theory we're after, but I don't have it.

Like, encoding what "coercion" is would be an expression of values. It's more meta, and more universalizable, and stuff, but it's still something that someone might strongly object to, and so it's coercion in some sense. We could try to talk about what possible reflectively stable people / societies would consider as good rules for the initial reflection process, but it seems like there would be multiple fixed points, and probably some people today would have revealed preferences that distinguish those possible fixed points of reflection, still leaving open conflict.  Cf. [] 

Then that isn't the CEV operation.

The CEV operation tries to return a fixed point of idealized value-reflection. Running immortal people forward inside of a simulated world is very much insufficiently idealized value-reflection, for the reasons you suggest, so simply simulating people interacting for a long time isn't running their CEV.

How would you run their CEV? I'm saying it's not obvious how to do it in a way that both captures their actual volition, while avoiding coercion. You're saying "idealized reflection", but what does that mean?

Large single markets are (pretty good) consequentialist engines. Run one of these for a while, and you can expect significantly improving outcomes inside of that bloc, by the lights of the entities participating in that single market.

Back and Forth

Only make choices that you would not make in reverse, if things were the other way around. Drop out of school if and only if you wouldn't enroll in school from out of the workforce. Continue school if and only if you'd switch over from work to that level of schooling.

Flitting back and forth between both possible worlds can make you less cagey about doing what's overdetermined by your world model + utility function already. It's also part of the exciting rationalist journey of acausally cooperating with your selves in other possible worlds.

It's probably a useful mental technique to consider from both directions, but also consider that choices that appear symmetric at first glance may not actually be symmetric. There are often significant transition costs that may differ in each direction, as well as path dependencies that are not immediately obvious. As such, I completely disagree with the first paragraph of the post, but agree with the general principle of considering such decisions from both directions and thank you for posting it.

I had assumed the first -- they're afraid of imperfect-values lock-in. I think it's the "not to the problem of preventing complete disaster" phrase that tipped me off here.

The verdict that knowledge is purely a property of configurations cannot be naively generalized from real life to GPT simulations, because “physics” and “configurations” play different roles in the two (as I’ll address in the next post). The parable of the two tests, however, literally pertains to GPT. People have a tendency to draw erroneous global conclusions about GPT from behaviors which are in fact prompt-contingent, and consequently there is a pattern of constant discoveries that GPT-3 exceeds previously measured capabilities given alternate conditio

... (read more)

This kind of comment ("this precise part had this precise effect on me") is a really valuable form of feedback that I'd love to get (and will try to give) more often. Thanks! It's particularly interesting because someone gave feedback on a draft that the business about simulated test-takers seemed unnecessary and made things more confusing.

Since you mentioned, I'm going to ramble on about some additional nuance on this point.

Here's an intuition pump which strongly discourages "fundamental attribution error" to the simulator:

Imagine a machine where you feed... (read more)

Reflexively check both sides of the proposed probability of an event:

"What do I think about P(DOOM) = 81%?"


"What do I think about P(~DOOM) = 19%?"

This can often elicit feedback from parts of you that would stay silent if you only considered one way of stating the probability in question.

When the sanity waterline is so low, it's easy to develop a potent sense of misanthropy.

Bryan Caplan's writing about many people hating stupid people really affected me on this point. Don't hate, or even resent, stupid people; trade with them! This is a straightforward consequence of Ricardo's comparative advantage theorem. Population averages are overrated; what matters is whether the individual interactions between agents in a population are positive-sum, not where those individual agents fall relative to the population average.

I note that Eliezer thinks that corrigibility is one currently-impossible-to-instill-in-an-AGI property that humans actually have. The sum total of human psychology... consists of many such impossible-to-instill properties.

This is why we should want to accomplish one impossible thing, as our stopgap solution, rather than aiming for all the impossible things at the same time, on our first try at aligning the AGI.

I hereby confer on you, reader, the shroud of epistemic shielding from predictably misleading statements. It confers irrevocable, invokable protection from having to think about predictably confused claims ever again.

Take those cognitive cycles saved, and spend them well!

I've noticed that part of me likes to dedicate disproportionate cognitive cycles to the question: "If you surgically excised all powerful AI from the world, what political policies would be best to decree, by your own lights?"

The thing is, we live in a world with looming powerful AI. It's at least not consequentialist to spend a bunch of cognitive cycles honing your political views for a world we're not in. I further notice that my default justification for thinking about sans-AI politics a lot is consequentialist... so something's up here. I think some pa... (read more)

Load More