Jason Gross


Sorted by New

Wiki Contributions


I think this factoring hides the computational content of Löb's theorem (or at least doesn't make it obvious).  Namely, that if you have , then Löb's theorem is just the fixpoint of this function.

Here's a one-line proof of Löb's theorem, which is basically the same as the construction of the Y combinator (h/t Neel Krishnaswami's blogpost from 2016):

where  is applying internal necessitation to , and .fwd (.bak) is the forward (reps. backwards) direction of the point-surjection .

The relevant tradeoff to consider is the cost of prediction and the cost of influence.  As long as the cost of predicting an "impressive output" is much lower than the cost of influencing the world such that an easy-to-generate output is considered impressive, then it's possible to generate the impressive output without risking misalignment by bounding optimization power at lower than the power required to influence the world.

So you can expect an impressive AI that predicts the weather but isn't allowed to, e.g., participate in prediction markets on the weather nor charter flights to seed clouds to cause rain, without needing to worry about alignment.  But don't expect alignment-irrelevance from a bot aimed at writing persuasive philosophical essays, nor an AI aimed at predicting the behavior of the stock market conditional on the trades it tells you to make, nor an AI aimed at predicting the best time to show you an ad for the AI's highest-paying company.

No. The content of the comment is good. The bad is that it was made in response to a comment that was not requesting a response or further elaboration or discussion (or at least not doing so explicitly; the quoted comment does not explicitly point at any part of the comment it's replying to as being such a request). My read of the situation is that person A shared their experience in a long comment, and person B attempted to shut them down / socially-punish them / defend against the comment by replying with a good statement about unhealthy dynamics, implying that person A was playing into that dynamic, without specifying how person A played into that dynamic, when it seems to me that in fact person A was not part of that dynamic and person B was defending themselves without actually saying what they're protecting nor how it's being threatened. This occurs to me as bad form, and I believe it's what Duncan is pointing at.

Where bad commentary is not highly upvoted just because our monkey brains are cheering, and good commentary is not downvoted or ignored just because our monkey brains boo or are bored.

Suggestion: give our monkey brains a thing to do that lets them follow incentives while supporting (or at least not interfering with) the goal. Some ideas:

  • split upvotes into "this comment has the Right effect on tribal incentives" and "after separating out its impact on what side the reader updates towards, this comment is still worth reading"
  • split upvotes into flair (a la basecamp), letting people indicate whether the upvote is "go team!" or "this made me think" or "good point" or " good point but bad technique", etc

Option number 3 seems like more-or-less a real option to me, given that "this document" is the official document prepared and published by the CDC a decade or two ago, and "sensible scientist-policymakers like myself" includes any head of the CDC back when the position was for career-civil-servants rather than presidential appointees, and also includes the task force that the Bush Administration specifically assembled to generate this document, and also included person #2 in California's public health apparatus (who was passed over for becoming #1 because she was too blond / not racially diverse enough, and who was later cut out of the relevant meetings by her new boss).

Edit: Also, the "guard it from anything that could derail their benevolent behavior" is not necessary, all that's needed here is to actually give them enough power / rope to hang themselves to let them implement the plan.

The Competent Machinery did exist, it just wasn't competent enough to overcome the fact that the rest of the government machinery was obstructing it. The plan for social distancing to deal with pandemics was created during the Bush administration, there were people in government trying to implement the plan in ... mid-January, if I recall correctly (might have been mid-February). If, for example, the government made an exception to medical privacy laws specifically for reporting the approximate address of positive COVID tests, and the CDC / government had not forbidden independent COVID testing in the early days, we probably would have been able to actually stamp out COVID. (Source: The Premonition: A Pandemic Story (it's an excellent book, and I highly recommend it))

Some extra nuance for your examples:

There is a substance XYZ, it's called "anti-water", it filling the hole of water in twin-Earth mandates that twin-Earth is made entirely of antimatter, and then the only problem is that the vacuum of space isn't vacuum enough (e.g., solar wind (I think that's what it's called), if nothing else, would make that Earth explode). More generally, it ought to be possible to come up with a physics where all the fundamental particles have an extra "tag" that carries no role (which in practice, I think, means that it functions just to change the number of microstates when particles with different tags are mixed --- I once tried to figure out what sort of measurement would be needed to determine empirically whether a glass of water in fact had only one kind of water, or had multiple kinds of otherwise-identical water, but have not been able to understand chemical potential enough to finish the thought experiment). Maybe furthermore there's some complicated force acting on the tags that changes them when the density of a particular tag is high enough, so that the tag difference between our Earth and twin-Earth can be maintained. We just have no evidence of such an attribute, hence Occam's razor presumes it to not exist.

I keep meaning to (re)work out the details on the gyroscope example; I think it should follow basically just from F = ma and the rigid body approximation (or maybe springs, if we skip rigid bodies), which means that denying gyroscopic procession basically breaks all of physics that involves objects in motion.

I think a better steelman in Example 1: Price Gouging, is that the law is meant to prevent rent-seeking, i.e., prevent people extracting money from the system without providing commensurate value. (The only example here that I understand even partially is landlords charging rent just because they own the land, and one fix to this is the land-value tax -- see the ACX book review of Progress and Poverty for an excellent explanation. It feels like there should be some analogue here, but I can't model enough economic nuance in my head to generate it and I'm not familiar enough with economics to tease it out.)

In Example 2: An orphan, or an abortion?, there's a further interesting note that outlawing abortion increases crime a decade or two later, because the children who would have been aborted are the ones who are most likely to grow up to become criminals. (Source: Freakonomics)

I think the thing you're looking for is traditionally called "third-party punishment" or "altruistic punishment", c.f. https://en.wikipedia.org/wiki/Third-party_punishment . Wikipedia cites Bendor, Jonathon; Swistak, Piot (2001). "The Evolution of Norms". American Journal of Sociology. 106 (6): 1493–1545. doi:10.1086/321298, which seems at least moderately non-technical at a glance.


I think I first encountered this in my Moral Psychology class at MIT (syllabus at http://web.mit.edu/holton/www/courses/moralpsych/home.html ), and I believe the citation was E. Fehr & U. Fischbacher 'The Nature of Human Altruism' Nature 425 (2003) 785-91.  The bottom of the first paragraph on page 787 in https://www.researchgate.net/publication/9042569_The_Nature_of_Human_Altruism ("In fact, it can be shown theoretically thateven a minority of strong reciprocators suffices to discipline amajority of selfish individuals when direct punishment is possible.") seems related but not exactly what you're looking for.

I think another interesting datapoint is to look at where our hard-science models are inadequate because we haven't managed to run the experiments that we'd need to (even when we know the theory of how to run them). The main areas that I'm aware of are high-energy physics looking for things beyond the standard model (the LHC was an enormous undertaking and I think the next step up in particle accelerators requires building one the size of the moon or something like that), gravity waves (similar issues of scale), and quantum gravity (similar issues + how do you build an experiment to actually safely play with black holes?!) On the other hand, astrophysics manages to do an enormous amount (star composition, expansion rate of the universe, planetary composition) with literally no ability to run experiments and very limited ability to observe. (I think a particularly interesting case was the discovery of dark matter (which we actually still don't have a model for), which we discovered, iirc, by looking at a bunch of stars in the milky way and determining their velocity as a function of distance from the center by (a) looking at which wavelengths of light were missing to determine their velocity away/towards us (the elements that make up a star have very specific wavelengths that they absorb, so we can tell the chemical composition of a star by looking at the pattern of what wavelengths are missing, and we can get velocity/redshift/blueshift by looking at how far off those wavelengths are from what they are in the lab) and (b) picking out stars of colors that we know come only in very specific brightnesses so that we can use apparent brightness to determine how far away the star is, and (c) use it's position in the night sky to determine what vector to use so we can position it relative to the center of the galaxy, and finally (d) notice that the velocity as a function of radius function is very very different from what it would be if the only mass causing gravitational pull were the visible star mass, and then inverting the plot to determine the spatial distribution of this newfound "dark matter". I think it's interesting and cool that there's enough validated shared model built up in astrophysics that you can stick a fancy prism in front of a fancy eye and look at the night sky and from what you see infer facts about how the universe is put together. Is this sort of thing happening in biology?)

By the way,

The normal tendency to wake up feeling refreshed and alert gets exaggerated into a sudden irresistable jolt of awakeness.

I'm pretty sure this is wrong. I'll wake up feeling unable to go back to sleep, but not feeling well-rested and refreshed. I imagine it's closer to a caffeine headache? (I feel tired and headachy but not groggy.) So, at least for me, this is a body clock thing, and not a transient effect.

Load More