Jonas Hallgren

Alignment Field Building and AI alignment focused, especially interested in agent foundations 

Wiki Contributions


Hey you, whoever is reading this comment, this post is not an excuse to skip working on alignment. I can fully relate to the fear of death here, and my own tradeoff is focusing hard on instrumental goals such as my own physical health and nutrition (including supplements) to delay death and get some nice productivity benefits. This doesn't mean that an AI won't kill you within 15 years, so it's most likely not even a defect in a tragedy of commons to not work on it; it's rather paramount to your future success at being alive.

(Also if we solve alignment, then we can get a pretty op AGI that can help us out with the other stuff so really it's very much a win-win in my mind.)

I wanted to say that I tried learning IIT about 2 years ago after reading about most other theories of consciousness and that it was a pain in the ass and so I gave up. Thank you for this post and especially that section. I really like attention schema theory as I hadn't thought about combining the approaches of GWT and strange loops.

I've also got one thing that I want to bring up about your conclusion and also one perspective that I personally find interesting that you don't necessarily bring up in the post.

With regards to the conclusion:
Why do you assume that consciousness is an illusion just because consciousness is a strange loop observing itself? Why can't self-referential things be real? My belief here is that everything that is a possible mathematical structure is real, and even though self-referring leads to error in any axiomatic system, that just means we can't define the things themselves. Just like how spacetime forms singularities, this can exist in a world which is as real as any other world. 

Secondly, you mention panpsychism quickly, but you don't mention the no-self perspective and panpsychism. This is quite an eastern perspective and states that you're every experience that you have and that everything that appears is itself part of consciousness. This is essentially panpsychism, with every experience divided into it's smallest subcomponents. The evidence for this would be meditation, I can feel that I am the sense experience in my fingers while writing. You don't really mention this view. 

Lastly, I want to mention how I bring these two views together in my head. My feeling of being my fingers doesn't arise from the fact that I'm observing myself as a strange loop but instead that every experience is a strange loop in itself. 
The logical conclusion of strange loops is in my opinion that every part of reality is a strange loop viewing itself and that every system can come up with a symbol for I even if it's not what we think of as thinking.

First and foremost, my confidence in the descriptions of different distillation methods is pretty low. It is a framework I've thrown together from discussions on what an optimal science communication landscape would look like. It is in its initial phases and will most likely be imperfect for quite some time as finding the optimal communication landscape is a difficult problem. 

Secondly, Great point! I think that my thinking of it, is as a "reinterpretation of existing research." The basic way of doing this is rewriting a post for higher clarity which is the classical way that a distillation is viewed from. 

I think there are more ways of doing this and that the space is underexplored. In terms of the terminology proposed in the course, a "classic" distillation is some combination of what I would describe as propagating and bushwacking.

Bushwacking would be more something like asking, "what the f*ck is going on here?" which might be relevant for things such as infra-bayesianism (I want to learn infra-bayesianism can someone please bushwack this). 

Propagating would be more of what Rob Miles is doing. 

So what is distillation? What is the superclass of all of these? 

I would phrase it like the following "A distillation is a work that takes existing research and reinterprets it in a new light." 

Finally, a meta point in defence of the introduction of new jargon. I think the term distillation is confusing in itself as it can mean a lot of things, and therefore if you say, "I'm bushwhacking this post" you get the idea that "ah, this person is cutting down the weeds of what is a confusing post". I hope to introduce new methodology so it is easier to understand what type of distillation someone is doing. (I don't think this terminology is optimal, but it's a start in the right direction IMO.)

I will post my favourite poem to describe how I feel: 

Do not go gentle into that good night,
Old age should burn and rave at close of day;
Rage, rage against the dying of the light.

Though wise men at their end know dark is right,
Because their words had forked no lightning they
Do not go gentle into that good night.

Good men, the last wave by, crying how bright
Their frail deeds might have danced in a green bay,
Rage, rage against the dying of the light.

Wild men who caught and sang the sun in flight,
And learn, too late, they grieved it on its way,
Do not go gentle into that good night.

Grave men, near death, who see with blinding sight
Blind eyes could blaze like meteors and be gay,
Rage, rage against the dying of the light.

And you, my father, there on the sad height,
Curse, bless, me now with your fierce tears, I pray.
Do not go gentle into that good night.
Rage, rage against the dying of the light.

I will not go gentle into that cold night.

It feels kind of weird that this post only has 50 upvotes and is hidden in the layers of lesswrong as some skeleton in the closet waiting to strike at an opportune time. A lot of big names commented on this post and even though it's not entirely true and misrepresenting what happened to an extent it would make sense to kind of promote this type of a post anyway. It's setting a bad example if we don't promote as we then show that we don't encourage criticism which seems very anti-rational. Maybe a summary article of this incident could be done and put on the main website? It doesn't make sense to me that a post with a whooping 900 comments should be this hidden and it sure doesn't look good from an outside perspective.

Maybe this isn't the most productive comment but I just wanted to say that this was a really good post. It's right down my alley with video games and academics at the same time and I would therefore like to declare it a certified hood classic. (apparently Grammarly thinks this comment is formal which is pretty funny.)

Wow, this changed my life! Never thought I would find something this mind-blowingly overpowered on LessWrong!

But the problem runs deeper than that. If we draw an arrow in the direction of the deterministic function, we will be drawing an arrow of time from the more refined version of the structure to the coarser version of that structure, which is in the opposite direction of all of our examples.

 As I currently understand this after thinking about it for a bit, we are talking about the coarseness of the model from the perspective of the model in the timeframe that it is in and not the time frame that we are in. It would make sense for our predictions of the model to become more coarse with each step forward in time if we are predicting it from a certain time into the future time-space. I don't know if this makes sense but I would be grateful for a clarification!

Good question, this is rather applied on a system scale level so for example, a democratic system is going to be inherently more reversible than a non-democratic system. An action that goes against the reversibility of a system could for example be the removal of freedom of speech as it would narrow down the potential pathways of future civilizations. Reversibility has an opportunity cost inherent to it as it asks us to take into consideration the possibility of other morals being correct. This is like Pascal's mugging but with the stakes that if we have the wrong moral theory then we lose a lot. This means that if you have a utilitarian lens it might be less effective as there are actions that might be good from the utilitarian standpoint such as turning everything into hedonium, that are bad from a reversibility standpoint as we can't change anything from there.