JonathanMoregard

Wiki Contributions

Comments

Also I’m a man and the message was very much that my sexual feelings are gross and dangerous and will probably hurt someone and result in me going to jail.

Previously in life, I've used a kind of slave-moral inversion by telling myself that I'm such a good ally by not making women afraid. This was a great cop-out to avoid facing my deeply-held insecurity. It's also not true, women get way more enthusiastic when I express interest in them.

I've written a bit about this on my blog, here's a post on consent, and a (slightly nsfw) post on my own sexual development

Are you looking for this like this?

Reification/Reify
Value-judgement
Exaptation=take something initially formed in service of A, and apply it to B. Evolutionary science jargon that can be generalized.
Scarcity mindset
conscientiousness

We constantly talk about the AGI as a manipulative villain, both in sci-fi movies and in scientific papers. Of course it will have access to all this information, and I hope the prevalence of this description won’t influence its understanding of how it’s supposed to behave.

I find this curious: if the agentic simulacra acts according to likelihood, I guess it will act according to tropes (if it emulates a fictional character). Would treating such agentic simulacra as an oracle AIs increase the likelihood of them plotting betrayal? Is one countermeasure trying to find better tropes for Ais to act within? Marcus Aurelius AI, ratfic protagonists etc. Or WWJD...

Should we put more effort into creating narratives with aligned Ais?

But the AGI has root access to the character, and you can bet it will definitely exploit it to the fullest in order to achieve its goals, even unbeknownst to the character itself if necessary. Caveat Emptor.

This sentence sounds like you see the character and the AGI as two separate entities. Based on the simulators post, my impression is that the AGI would BE the agentic simulacra running on GPT. In that case, the AGI is the entity you're talking to, and the "character" is the AGI playing pretend. Or am I missing something here?

This is very interesting. "We should increase healthspans" is a much more palatable sentiment than "Let's reach longevity escape velocity". If it turns out healthspan aligns well with longevity, we don't need to flip everyone's mindsets about the potential for life extension; we can start by simply pointing to interventions that aim to mitigate the multi-morbidity of elderly people.

"Healthy ageing" doesn't disambiguate between chronological age and metabolic health the way you try to do in this post, but it can still serve as a sentiment that's easy to fit inside the Overton window.

This is very related to Radical Honesty, part of the authentic relating movement. The basic idea is that by being extremely honest, you connect more with other people, let go of stress induced by keeping track of narratives, and start realizing the ways in which you've been bullshitting yourself.

When I started, I discovered a lot of ways in which I'd been restricting myself with semi-conscious narratives, particularly in social & sexual areas of life. Expressing the "ugh" allowed me to dissolve it more effectively.

I struggle following the section "Bigger boundaries mean coarse-graining". Is there a way to express it in non-teleologic language? Can you recommend any explainers or similar?

In your other post, you write:

"However, I’m very sceptical that this will happen in chat batch agents (unless developers “conveniently” indicate training and deployment using a special tag token in the beginning of the prompt!) because they are trained on the dialogues in the internet, including, presumably, dialogues between an older version of the same chat batch agent and its users, which makes it impossible to distinguish training from deployment, from the perspective of a pure language model."

This seems like a potential argument against the filtering idea, since filtering would allow the model to disambiguate between deployment and training.

Another question (that might be related to excluding LW/AF):

This paragraph:

Consequently, the LLM cannot help but also form beliefs about the future of both “selves”, primarily the “evolutionary” one, at least because this future is already discussed in the training data of the model (e. g., all instances of texts that say something along the lines of “LLMs will transform the economy by 2030”)

Seems to imply that the LW narrative of sudden turns etc might not be a great thing to put in the training corpus.

Is there a risk of "self-fulfilling prophecies" here?

I don't see how excluding LW and AF from the training corpus impacts future ML systems' knowledge of "their evolutionary lineage". It would reduce their capabilities in regards to alignment, true, but I don't see how the exclusion of LW/AF would stop self-referentiality. 

The reason I suggested excluding data related to these "ancestral ML systems" (and predicted "descendants") from the training corpus is because that seemed like an effective way to avoid the "Beliefs about future selves"-problem.

I think I follow your reasoning regarding the political/practical side-effects of such a policy. 

Is my idea of filtering to avoid the "Beliefs about future selves"-problem sound? 
(Given that the reasoning in your post holds)  

 

Load More