The ML models that now speak English, and are rapidly growing in world-transformative capability, happen to be called transformers.
Two moments of growing in mathematical maturity I remember vividly:
I found it distracting that all your examples were topical, anti-red-tribe coded events. That reminded me of
...In Artificial Intelligence, and particularly in the domain of nonmonotonic reasoning, there’s a standard problem: “All Quakers are pacifists. All Republicans are not pacifists. Nixon is a Quaker and a Republican. Is Nixon a pacifist?”
What on Earth was the point of choosing this as an example? To rouse the political emotions of the readers and distract them from the main question? To make Republicans feel unwelcome in courses on Artificial Intelligenc
...2. The anchor of a major news network donates lots of money to organizations fighting against gay marriage, and in his spare time he writes editorials arguing that homosexuals are weakening the moral fabric of the country. The news network decides they disagree with this kind of behavior and fire the anchor.
a) This is acceptable; the news network is acting within their rights and according to their principles
b) This is outrageous; people should be judged on the quality of their work and not their political beliefs…
12. The principal of a private school is a
Academic philosophers are better than average at evaluating object-level arguments for some claim. They don't seem to be very good at thinking about what rationalization in search implies about the arguments that come up. Compared to academic philosophers, rationalists strike me as especially appreciating filtered evidence and its significance to your world model.
If you find an argument for a claim easily, then even if that argument is strong, this (depending on some other things) implies that similarly strong arguments on the other side may turn up with n...
Modest spoilers for planecrash (Book 9 -- null action act II).
...Nex and Geb had each INT 30 by the end of their mutual war. They didn't solve the puzzle of Azlant's IOUN stones... partially because they did not find and prioritize enough diamonds to also gain Wisdom 27. And partially because there is more to thinkoomph than Intelligence and Wisdom and Splendour, such as Golarion's spells readily do enhance; there is a spark to inventing notions like probability theory or computation or logical decision theory from scratch, that is not directly me
What sequence of characters could I possibly, actually type out into a computer that would appreciably reduce the probability that everything dies?
Framed like this, writing to save the world sounds impossibly hard! Almost everything written has no appreciable effect on our world's AI trajectory. I'm sure the "savior sequence" exists mathematically, but finding it is a whole different ballgame.
...In the beginning God created four dimensions. They were all alike and indistinguishable from one another. And then God embedded atoms of energy (photons, leptons, etc.) in the four dimensions. By virtue of their energy, these atoms moved through the four dimensions at the speed of light, the only spacetime speed. Thus, as perceived by any one of these atoms, space contracted in, and only in, the direction of that particular atom's motion. As the atoms moved at the speed of light, space contracted so much in the direction of the atom's motion that the dimen
Past historical experience and brainstorming about human social orders probably barely scratches the possibility space. If the CEV were to weigh in on possible posthuman social orders,[1] optimizing in part for how cool that social order is, I'd bet what it describes blows what we've seen out of the water in terms of cool factor.
(Presumably posthumans will end up reflectively endorsing interactions with one another of some description.)
Don't translate your values into just a loss function. Rather, translate them into a loss function and all the rest of a training story. Use all the tools at your disposal in your impossible task; don't tie one hand behind your back by assuming the loss function is your only lever over the AGI's learned values.
This post crystallized some thoughts that have been floating in my head, inchoate, since I read Zvi's stuff on slack and Valentine's "Here's the Exit."
Part of the reason that it's so hard to update on these 'creative slack' ideas is that we make deals among our momentary mindsets to work hard when it's work-time. (And when it's literally the end of the world at stake, it's always work-time.) "Being lazy" is our label for someone who hasn't established that internal deal between their varying mindsets, and so is flighty and hasn't precommitted to getting st...
A model I picked up from Eric Schwitzgebel.
The humanities used to be highest-status in the intellectual world!
But then, scientists quite visibly exploded fission weapons and put someone on the moon. It's easy to coordinate to ignore some unwelcome evidence, but not evidence that blatant. So, begrudgingly, science has been steadily accorded more and more status, from the postwar period on.
"Calling babble and prune the True Name of text generation is like calling bogosort the True Name of search."
...In the 1920s when and CL began, logicians did not automatically think of functions as sets of ordered pairs, with domain and range given, as mathematicians are trained to do today. Throughout mathematical history, right through to computer science, there has run another concept of function, less precise at first but strongly influential always; that of a function as an operation-process (in some sense) which may be applied to certain objects to produce other objects. Such a process can be defined by giving a set of rules describing how it acts
...Given a transformer model, it's probably possible to find a reasonably concise energy function (probably of a similar OOM of complexity as the model weights themselves) whose minimization corresponds to executing forwards passes of the transformer. However, this [highly compressive] energy function wouldn't tell you much about what the personas simulated by the model "want" or how agentic they were, since the energy function is expressed in the ontology of model weights and activations, not an agent's beliefs / goals. [This has] the type signature of a uti
This is a great theorem that's stuck around in my head this last year! It's presented clearly and engagingly, but more importantly, the ideas in this piece are suggestive of a broader agent foundations research direction. If you wanted to intimate that research direction with a single short post that additionally demonstrates something theoretically interesting in its own right, this might be the post you'd share.
This post has successfully stuck around in my mind for two years now! In particular, it's made me explicitly aware of the possibility of flinching away from observations because they're normie-tribe-coded.
I think I deny the evidence on most of the cases of dogs generating complex English claims. But it was epistemically healthy for that model anomaly to be rubbed in my face, rather than filter-bubbled away plus flinched away from and ignored.
This is a fantastic piece of economic reasoning applied to a not-flagged-as-economics puzzle! As the post says, a lot of its content is floating out there on the internet somewhere: the draw here is putting all those scattered insights together under their common theory of the firm and transaction costs framework. In doing so, it explicitly hooked up two parts of my world model that had previously remained separate, because they weren't obviously connected.
Complex analysis is the study of functions of a complex variable, i.e., functions where and lie in . Complex analysis is the good twin and real analysis the evil one: beautiful formulas and elegant theorems seem to blossom spontaneously in the complex domain, while toil and pathology rule in the reals. Nevertheless, complex analysis relies more on real analysis than the other way around.
--Pugh, Real Mathematical Analysis (p. 28)
One important idea I've picked up from reading Zvi is that, in communication, it's important to buy out the status cost imposed by your claims.
If you're fielding a theory of the world that, as a side effect, dunks on your interlocutor and diminishes their social status, you can work to get that person to think in terms of Bayesian epistemology and not decision theory if you make sure you aren't hurting their social image. You have to put in the unreasonable-feeling work of framing all your claims such that their social status is preserved or fairly increas...
I regret to inform you, you are an em inside an inconsistent simulated world. By this, I mean: your world is a slapdash thing put together out of off-the-shelf assets in the near future (presumably right before a singularity eats that simulator Earth).
Your world doesn't bother emulating far-away events in great detail, and indeed, may be messing up even things you can closely observe. Your simulators are probably not tampering with your thoughts, though even that is something worth considering carefully.
What are the flaws you...
When another article of equal argumentative caliber could have just as easily been written for the negation of a claim, that writeup is no evidence for its claim.
Switching costs between different kinds of work can be significant. Give yourself permission to focus entirely on one kind of work per Schelling unit of time (per day), if that would help. Don't spend cognitive cycles feeling guilty about letting some projects sit on the backburner; the point is to get where you're going as quickly as possible, not to look like you're juggling a lot of projects at once.
This can be hard, because there's a conventional social expectation that you'll juggle a lot of projects simultaneously, maybe because that's more legible t...
A multiagent Extrapolated Volitionist institution is something that computes and optimizes for a Convergent Extrapolated Volition, if a CEV exists.
Really, though, the above Extrapolated Volitionist institutions do take other people into consideration. They either give everyone the Schelling weight of one vote in a moral parliament, or they take into consideration the epistemic credibility of other bettors as evinced by their staked wealth, or other things like that.
Sometimes the relevant interpersonal parameters can be varied, and the institutional designs...
Because your utility function is your utility function, the one true political ideology is clearly Extrapolated Volitionism.
Extrapolated Volitionist institutions are all characteristically "meta": they take as input what you currently want and then optimize for the outcomes a more epistemically idealized you would want, after more reflection and/or study.
Institutions that merely optimize for what you currently want the way you would with an idealized world-model are old hat by comparison!
“What is the world trying to tell you?”
I've found that this prompt helps me think clearly about the evidence shed by the generator of my observations.
As Gauss stressed long ago, any kind of singular mathematics acquires a meaning only as a limiting form of some kind of well-behaved mathematics, and it is ambiguous until we specify exactly what limiting process we propose to use. In this sense, singular mathematics has necessarily a kind of anthropomorphic character; the question is not what is it, but rather how shall we define it so that it is in some way useful to us?
--E. T. Jaynes, Probability Theory (p. 108)
...Bogus nondifferentiable functions
The case most often cited as an example of a nondifferentiable function is derived from a sequence , each of which is a string of isosceles right triangles whose hypotenuses lie on the real axis and have length . As , the triangles shrink to zero size. For any finite , the slope of is almost everywhere. Then what happens as ? The limit is often cited carelessly as a nondifferentiable function. Now it is clear that the limit of the derivativ
Epistemic status: politics, known mindkiller; not very serious or considered.
People seem to have a God-shaped hole in their psyche: just as people banded around religious tribal affiliations, they now, in the contemporary West, band together around political tribal affiliations. Intertribal conflict can be, at its worst, violent, on top of mindkilling. Religious persecution in the UK was one of the instigating causes of British settlers migrating to the American colonies; religious conflict in Europe generally was severe.
In the US, the 1st Amendment legall...
...The explicit definition of an ordered pair is frequently relegated to pathological set theory...
It is easy to locate the source of the mistrust and suspicion that many mathematicians feel toward the explicit definition of ordered pair given above. The trouble is not that there is anything wrong or anything missing; the relevant properties of the concept we have defined are all correct (that is, in accord with the demands of intuition) and all the correct properties are present. The trouble is that the concept has some irreleva
Social niceties and professionalism act as a kind of 'communications handshake' in ordinary society -- maybe because they're still a credible correlate of having your act together enough to be worth considering your outputs in the first place?
Very cool! I have noticed that in arguments in ordinary academia people sometimes object that "that's so complicated" when I take a lot of deductive steps. I hadn't quite connected this with the idea that:
If you're confident in your assumptions ( is small), or if you're unconfident in your inferences ( is big), then you should penalise slow theories moreso than long theories, i.e. you should be a T-type.
I.e., that holding a T-type prior is adaptive when even your deductive inferences are noisy.
Also, I take it that this row of your table:
Debate | K |
...Now, whatever may assert, the fact that can be deduced from the axioms cannot prove that there is no contradiction in them, since, if there were a contradiction, could certainly be deduced from them!
This is the essence of the Gödel theorem, as it pertains to our problems. As noted by Fisher (1956), it shows us the intuitive reason why Gödel’s result is true. We do not suppose that any logician would accept Fisher’s simple argument as a proof of the full Gödel theorem; yet for most of us it is more convincing than Gödel’s
I know that the humans forced to smile are not happy (and I know all the mistakes they've made while programming me, I know what they should've done instead), but I don't believe that they are not happy.
These are different senses of "happy." It should really read:
I know forcing humans to smile doesn't make them , and I know what they should've written instead to get me to optimize for as they intended, but they are .
They're different concepts, so there's no strangeness here. The AGI knows what you meant...
...The human brain does not start out as an efficient reasoning machine, plausible or deductive. This is something which we require years to learn, and a person who is an expert in one field of knowledge may do only rather poor plausible reasoning in another. What is happening in the brain during this learning process?
Education could be defined as the process of becoming aware of more and more propositions, and of more and more logical relationships between them. Then it seems natural to conjecture that a small child reasons on a lattice of very open structur
Yeah, fair -- I dunno. I do know that an incremental improvement on simulating a bunch of people in an environment philosophizing is doing that but running an algorithm that prevents coercion, e.g.
I imagine that the complete theory of these incremental improvements (for example, also not running a bunch of moral patients for many subjective years while computing the CEV), is the final theory we're after, but I don't have it.
Then that isn't the CEV operation.
The CEV operation tries to return a fixed point of idealized value-reflection. Running immortal people forward inside of a simulated world is very much insufficiently idealized value-reflection, for the reasons you suggest, so simply simulating people interacting for a long time isn't running their CEV.
Large single markets are (pretty good) consequentialist engines. Run one of these for a while, and you can expect significantly improving outcomes inside of that bloc, by the lights of the entities participating in that single market.
Only make choices that you would not make in reverse, if things were the other way around. Drop out of school if and only if you wouldn't enroll in school from out of the workforce. Continue school if and only if you'd switch over from work to that level of schooling.
Flitting back and forth between both possible worlds can make you less cagey about doing what's overdetermined by your world model + utility function already. It's also part of the exciting rationalist journey of acausally cooperating with your selves in other possible worlds.
I had assumed the first -- they're afraid of imperfect-values lock-in. I think it's the "not to the problem of preventing complete disaster" phrase that tipped me off here.
...The verdict that knowledge is purely a property of configurations cannot be naively generalized from real life to GPT simulations, because “physics” and “configurations” play different roles in the two (as I’ll address in the next post). The parable of the two tests, however, literally pertains to GPT. People have a tendency to draw erroneous global conclusions about GPT from behaviors which are in fact prompt-contingent, and consequently there is a pattern of constant discoveries that GPT-3 exceeds previously measured capabilities given alternate conditio
This kind of comment ("this precise part had this precise effect on me") is a really valuable form of feedback that I'd love to get (and will try to give) more often. Thanks! It's particularly interesting because someone gave feedback on a draft that the business about simulated test-takers seemed unnecessary and made things more confusing.
Since you mentioned, I'm going to ramble on about some additional nuance on this point.
Here's an intuition pump which strongly discourages "fundamental attribution error" to the simulator:
Imagine a machine where you feed...
Reflexively check both sides of the proposed probability of an event:
"What do I think about P(DOOM) = 81%?"
and
"What do I think about P(~DOOM) = 19%?"
This can often elicit feedback from parts of you that would stay silent if you only considered one way of stating the probability in question.
When the sanity waterline is so low, it's easy to develop a potent sense of misanthropy.
Bryan Caplan's writing about many people hating stupid people really affected me on this point. Don't hate, or even resent, stupid people; trade with them! This is a straightforward consequence of Ricardo's comparative advantage theorem. Population averages are overrated; what matters is whether the individual interactions between agents in a population are positive-sum, not where those individual agents fall relative to the population average.
I note that Eliezer thinks that corrigibility is one currently-impossible-to-instill-in-an-AGI property that humans actually have. The sum total of human psychology... consists of many such impossible-to-instill properties.
This is why we should want to accomplish one impossible thing, as our stopgap solution, rather than aiming for all the impossible things at the same time, on our first try at aligning the AGI.
I hereby confer on you, reader, the shroud of epistemic shielding from predictably misleading statements. It confers irrevocable, invokable protection from having to think about predictably confused claims ever again.
Take those cognitive cycles saved, and spend them well!
I've noticed that part of me likes to dedicate disproportionate cognitive cycles to the question: "If you surgically excised all powerful AI from the world, what political policies would be best to decree, by your own lights?"
The thing is, we live in a world with looming powerful AI. It's at least not consequentialist to spend a bunch of cognitive cycles honing your political views for a world we're not in. I further notice that my default justification for thinking about sans-AI politics a lot is consequentialist... so something's up here. I think some pa...
I think that many (not all) of your above examples boil down to optimizing for legibility rather than optimizing for goodness. People who hobnob instead of working quietly will get along with their bosses better than their quieter counterparts, yes. But a company of brown nosers will be less productive than a competitor company of quiet hardworking employees! So there's a cooperate/defect-dilemma here.
What that suggests, I think, is that you generally shouldn't immediately defect as hard as possible, with regard to optimizing for appearances. Play the prev... (read more)