There doesn’t seem to be a consensus on what philosophy does or even what it is. One view of philosophy is that it is useless, or actively unhelpful, for alignment (at least of the ‘literally-don’t-kill-everyone’ variety, particularly if one’s timelines are short): it isn’t quantifiable, involves interminable debates, and talks about fuzzy-bordered concepts sometimes using mismatched taxonomies, ontologies, or fundamental assumptions
IMO it's accurate to say that philosophy (or at least the kind of philosophy that I find thought-worthy) is a category that includes high-level theoretical thinking that either (1) doesn't fit neatly into any of the existing disciplines (at least not yet) or (2) is strongly tied to one or some of them but engages in high-level theorizing/conceptual engineering/clarification/reflection to the extent that is not typical of that discipline ("philosophy of [biology/physics/mind/...]").
(1) is also contiguous with the history of the concept. At some point, all of science (perhaps except mathematics) was "(natural) philosophy". Then various (proto-)sciences started crystallizing and what was not seen as deserving of its own department, remained in the philosophy bucket.
I wonder how the following behavioral patterns fit into Shard Theory
Maybe also a reminder about the comments to which you've reacted with that. E.g., if you haven't replied in a week or so (could be configured per user or something)
Unfortunately it has a problem of its own - it’s sensitive to our choice of . By adding some made up element to with large negative utility and zero probability of occurring, we can make OP arbitrarily low. In that case basically all of the default relative expected utility comes from avoiding the worst outcome, which is guaranteed, so you don’t get any credit for optimising.
What if we measure the utility of an outcome relative not to the worst one but to the status quo, i.e., the outcome that would happen if we did nothing/took null action?
In that case, adding or subtracting outcomes to/from doesn't change for outcomes that were already in , as long as the default outcome also remains in .
Obviously, this means that for any depends on the choice of default outcome. But I think it's OK? If I have $1000 and increase my wealth to $1,000,000, then I think I "deserve" being assigned more optimization power than if I had $1,000,000 and did nothing, even if the absolute utility I get from having $1,000,000 is the same.
Correction: this is TEDx - a more local less official version of TED
Apparently, it's TED, I only looked at the channel name, sorry for confusion
Zvi will update the post if Yann responds further in the thread with Eliezer but there will be no new Zvi posts centered on Yann
Note that such a constellation would likely be unstable if the intelligence and capabilities of the AI increase over time, leading to a situation where the humans in the man-machine-system depend more and more on the AI and are less and less in control, up to the point where humans are not needed anymore and the uncontrollable man-machine-system transforms into an uncontrollable autonomous AI.
It would probably be quite easy to train a GPT (e.g., decision transformer) to predict actions made by human components of the system, so assumptions required for claiming that such a system would be unstable are minimal.
This was probably a factor but also:
IIRC research at WIV was done in collaboration with EcoHealth Alliance and/or other similar US-based orgs. Granting WIV BSL4 necessary for this kind of gain-of-function research was in part based on their assessments. US establishment had a reason to cover it up because it was in part their own fuckup.