Wiki Contributions


The Achilles Heel Hypothesis for AI

It's really nice to hear that the paper seems clear! Thanks for the comment. 

  • I've been working on this since March, but at a very slow pace, and I took a few hiatuses. most days when I'd work on it, it was for less than an hour. After coming up with the initial framework to tie things together, the hardest part was trying and failing to think of interesting ways in which most of the achilles heels presented could be used as novel containment measures. I discuss this a bit in the discussion section.

For 2-3, I can give some thoughts, but these aren't necessarily through through much more than many other people one could ask. 

  • I would agree with this. From an agent to even have a notion of being turned off, it would need some sort or model that accounts for this but which isn't learned via experience in a typical episodic learning setting (clearly because you can't learn after you're dead). This would all require a world model which would be more sophisticated than any sort of model-based RL techniques of which I know would be capable of by default.
  • I also would agree. The most straightforward way for these problems to emerge is if a predictor has access to source code. Though sometimes they can occur if the predictor has access to some other means of prediction which cannot be confounded by the choice of what source code the agent runs. I write a little about this in this post.
The Achilles Heel Hypothesis for AI

Thanks for the comment. +1 to it. I also agree that this is an interesting concept: using Achilles Heels as containment measures. There is a discussion related to this on page 15 of the paper. In short, I think that this is possible and useful for some achilles heels and would be a cumbersome containment measure for others which could be accomplished more simply via bribes of reward. 

Solipsism is Underrated


I disagree a bit. My point has been that it's easy for solipsism to explain consciousness and hard to materialism to. But it's easy for materialism to account for structure and hard solipsism to. Don't interpret the post as my saying solipsism wins--just that it's underrated. I also don't say qualia must be irreducible, just that there's spookiness if they are.

Solipsism is Underrated

Thanks! This is insightful.

What exactly would it mean to perform a baysian update on you not experiencing qualia?

Good point. In an anthropic sense, the sentence this is a reply to could be redacted. Experiencing qualia themselves would not be evidence to prefer one theory over another. Only experiencing certain types of observations would cause a meaningful update.

The primitives of materialism are described in equations. Does a solipsist seek an equation to tell them how angry they will be next Tuesday? If not, what is the substance of a solipsistic model of the world?

I think this is the same type of argument as saying that other people whom I observe seem to be very similar to me. The materialistic interpretation makes us believe in a less capricious world, but there's the trouble of explaining how conscious results from material phenomena. This is similar to my thoughts on the final 4 paragraphs of what you wrote.

I am not sure what you mean my that, I consider my mind to be just an arrangement of atoms. An arrangement governed by the same laws as the rest of the universe.

I think that works well. But I don't think that subjective experience falls out of this interpretation for free.

Solipsism is Underrated

Great comment. Thanks.

In the case of idealism, we call the ontological primitive "mental", and we say that external phenomena don't actually exist but instead we just model them as if they existed to predict experiences. I suppose this is a consistent view and isn't that different in complexity from regular materialism.

I can't disagree. This definitely shifts my thinking a bit. I think that solipsism + structured observations might be comparable in complexity to materialism + an ability for qualia to arise from material phenomena. But at that point the questions hinges a bit on what we think is spookier. I'm convinced that a material solution to the hard problem of consciousness is spooky. I think I could maybe be convinced that hallucinating structured observations might be similarly spooky.

And I think you're right about the problem of knowing what we're talking about.

Solipsism is Underrated

Thanks for the comment. I'm not 100% on the computers analogy. I think answering the hard problem of consciousness is significantly different compared to understanding how complex information processing systems like computers work. Any definition or framing of consciousness in terms of informational or computational theory may allow it to be studied in those terms in the same way that computers are can be understood by system based theoretical reasoning based on abstraction. However, I don't think this is what it means to solve the hard problem of consciousness. It seems more like solving the problem with a definition rather than an explanation.

I wonder how much differing perspectives here are due to differing intuitions. But in any case, I hope this makes my thinking more clear.

Solipsism is Underrated

I agree--thanks for the comment. When writing this post, my goal was to share a reflection on solipsism in a vacuum rather than in context of decision theory. I acknowledge that solipsism doesn't really tend to drive someone toward caring much about others and such. In that sense, it's not very productive if someone is altruistically/externally motivated.

I don't want to give any impression that this is a particularly important decision theoretic question. :)

Dissolving Confusion around Functional Decision Theory

Thanks for the comment. I think it's exciting for this to make it into the newsletter. I am glad that you liked these principles.

I think that even lacking a concept of free will, FDT can be conveniently thought of applying to humans through the installation of new habits or ways of thinking without conflicting with the framework that I aim to give here. I agree that there are significant technical difficulties in thinking about when FDT applies to humans, but I wouldn't consider them philosophical difficulties.

Dissolving Confusion around Functional Decision Theory
I'm skeptical of this. Non-mere correlations are consequences of an agent's source-code producing particular behaviors that the predictor can use to gain insight into the source-code itself. If an agent adaptively and non-permanently modifies its souce-code, this (from the perspective of a predictor who suspects this to be true), de-correlates it's current source code from the non-mere correlations of its past behavior -- essentially destroying the meaning of non-mere correlations to the extent that the predictor is suspicious.

Oh yes. I agree with what you mean. When I brought up the idea about an agent strategically acting certain ways or overwriting itself to confound the predictions that adversarial predictors, I had in mind that the correlations that such predictors used could be non-mere w.r.t. the reference class of agents these predictors usually deal but still confoundable by our design of the agent and thereby non mere to us.

For instance, given certain assumptions, we can make claims about which decision theories are good. For instance, CDT works amazingly well in the class of universes where agents know the consequences of all their actions. FDT (I think) works amazingly well in the class of universes where agents know how non-merely correlated their decisions are to events in the universe but don't know why those correlations exist.

+1 to this. I agree that this is the right question to be asking, that it depends on a lot of assumptions about how adversarial an environment is, and that FDT does indeed seem to have some key advantages.

Also as a note, sorry for some differences in terminology between this post and the one I linked to on my Medium blog.

Load More