Mateusz Bagiński

Wiki Contributions

Comments

This was probably a factor but also:

IIRC research at WIV was done in collaboration with EcoHealth Alliance and/or other similar US-based orgs. Granting WIV BSL4 necessary for this kind of gain-of-function research was in part based on their assessments. US establishment had a reason to cover it up because it was in part their own fuckup.

There doesn’t seem to be a consensus on what philosophy does or even what it is.  One view of philosophy is that it is useless, or actively unhelpful, for alignment (at least of the ‘literally-don’t-kill-everyone’ variety, particularly if one’s timelines are short): it isn’t quantifiable, involves interminable debates, and talks about fuzzy-bordered concepts sometimes using mismatched taxonomies, ontologies, or fundamental assumptions

 

IMO it's accurate to say that philosophy (or at least the kind of philosophy that I find thought-worthy) is a category that includes high-level theoretical thinking that either (1) doesn't fit neatly into any of the existing disciplines (at least not yet) or (2) is strongly tied to one or some of them but engages in high-level theorizing/conceptual engineering/clarification/reflection to the extent that is not typical of that discipline ("philosophy of [biology/physics/mind/...]").

(1) is also contiguous with the history of the concept. At some point, all of science (perhaps except mathematics) was "(natural) philosophy". Then various (proto-)sciences started crystallizing and what was not seen as deserving of its own department, remained in the philosophy bucket.

I wonder how the following behavioral patterns fit into Shard Theory

  • Many mammalian species have strong default aversion to young of their own species. They (including females) deliberately avoid contact with the young and can even be infanticidal. Physiological events associated with pregnancy (mostly hormones) rewires the mother's brain such that when she gives birth, she immediately takes care of the young, grooms them etc., something she has never done before. Do you think this can be explained by the rewiring of her reward circuit such that she finds simple actions associated with the pups highly rewarding and then bootstraps to learning complex behaviors from that?
  • Salt-starved rats develop an appetite for salt and are drawn to stimuli predictive of extremely salty water, even though on all previous occasions they found it extremely aversive, which caused them to develop conditioned fear response to the cue predictive of salty water. (see Steve's post on this experiment

Maybe also a reminder about the comments to which you've reacted with that. E.g., if you haven't replied in a week or so (could be configured per user or something)

Unfortunately it has a problem of its own - it’s sensitive to our choice of . By adding some made up element to  with large negative utility and zero probability of occurring, we can make OP arbitrarily low. In that case basically all of the default relative expected utility comes from avoiding the worst outcome, which is guaranteed, so you don’t get any credit for optimising.

 

What if we measure the utility of an outcome relative not to the worst one but to the status quo, i.e., the outcome that would happen if we did nothing/took null action?

In that case, adding or subtracting outcomes to/from   doesn't change  for outcomes that were already in , as long as the default outcome also remains in .

Obviously, this means that  for any  depends on the choice of default outcome. But I think it's OK? If I have $1000 and increase my wealth to $1,000,000, then I think I "deserve" being assigned more optimization power than if I had $1,000,000 and did nothing, even if the absolute utility I get from having $1,000,000 is the same.

Is this post supposed to communicate something like Stranger Than History?

Correction: this is TEDx - a more local less official version of TED Apparently, it's TED, I only looked at the channel name, sorry for confusion

Zvi will update the post if Yann responds further in the thread with Eliezer but there will be no new Zvi posts centered on Yann

Note that such a constellation would likely be unstable if the intelligence and capabilities of the AI increase over time, leading to a situation where the humans in the man-machine-system depend more and more on the AI and are less and less in control, up to the point where humans are not needed anymore and the uncontrollable man-machine-system transforms into an uncontrollable autonomous AI.

It would probably be quite easy to train a GPT (e.g., decision transformer) to predict actions made by human components of the system, so assumptions required for claiming that such a system would be unstable are minimal.

Load More