Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

The Chimp Paradox by Steve Peters talks about some of the same concepts, as well as giving advice on how to try and work effectively with your chimp (his word for the base layer, emotive, intuitive brain). The book gets across the same concepts - the fact that we have what feels like a seperate entity living inside our heads, that it runs on emotions and instinct, and is more powerful than us, or its decisions take priority over ours.

 Peters likens trying to force our decisions against the chimp's desires to "Arm wrestling the chimp". The chimp is stronger than you, the chimp will almost always win. Peters goes on to suggest other strategies for handling the chimp, actions which might seem strange to you (the mask, the computer, the system 2 part of the brain) but make sense to chimp-logic, and allow you to both get what you want.

I find the language of the book a bit too childish and metaphorical, but the advice is generally useful in my experience. I should probably revisit it.

The tweet is sarcastically recommending that instead of investigating the actual hard problem, they should instead investigate a much easier problem which superficially sounds the same.

In the context of AI safety (and the fact that the superalignment team is gone) the post is suggesting that OpenAI isn't actually addressing the hard alignment problem, instead opting to tune their models to avoid outputting offensive or dangerous messages in the short term, which might seem like a solution to a lay-person.

Definitely not the only one. I think the only way I would be halfway comfortable with the early levels of intrusion that are described is if I were able to ensure the software is offline and entirely in my control, without reporting back to whoever created it, and even then, probably not. 

Part of me envys the tech-optimists for their outlook, but it feels like sheer folly.

This is fascinating. Thanks for investigating further. I wonder if you trained it on a set of acrostics for the word "HELL" or "HELMET", it might incorrectly state that the rule is that it's spelling out the word "HELLO".

This is surprising to me. Is it possible that the kind of introspection you describe isn't what's happening here?

The first line is generic and could be used for any explanation of a pattern.
The second line might use the fact that the first line started with a "H" plus the fact that the initial message starts with "Hello" to deduce the rest.

I'd love to see this capability tested with a more unusual word than "Hello" (which often gets used as example or testing code to print "Hello World") and without the initial message beginning with the answer to the acrostic.

I think it's entirely possible that AI will be able to create relationships which feel authentic. Arguably we are already at that stage.

I don't think it follows that I will feel like those relationships ARE authentic if I know that the source is AI. Relationships with different entities aren't necessarily equivalent if those entities have behaved identically until the present moment - we also have to account for background knowledge and how that impacts a relationship.

Much like it's possible to feel like you are in an authentic relationship with a psychopath, but once you understand that the other person is only simulating emotional responses rather than experiencing them, that knowledge undermines every part of the relationship, even if they have not yet taken any action to exploit or manipulate you, or behave similarly to a non-psychopathic friend.

I suppose the difference between AI/psychopath relationships vs relationships between empathetic humans is that in empathetic humans I can be reasonably confident that the pattern of response and action is a result of instinctual emotional responses, something which the person has no direct control over. They're not scheming to appear to like me and as a result there is less risk that they will radically alter their behaviour if circumstances change. I can trust another person much more readily if I can accurately model the thing which generates their responses to my actions, and have some kind of assurance that this behaviour will remain consistent even if circumstances change (or a clear idea of what kinds of circumstances might change the behaviour).

If my friendship with Josie has lasted for years and I'm confident that Josie is another empathetic human, generating her responses to me from much the same processes I use, then when I (for example) do something that our authoritarian government doesn't like, I might go to Josie seeking shelter.

If I have a similar relationship with Mark12, a autonomous AI cluster (but I'm not really clear on how Mark12 generates their behaviour) even if that they have been fun and shown kindness to me in the past, I'm unlikely to ask them for help given that my circumstances have radically changed. I can't know what kind of rules Mark12 ultimately runs by and I can't ever be sure I'm modelling them accurately. There are no sensible indicators or rate-limits to how quickly Mark12's behaviour might change. For all I know they could get an update overnight and be a completely different entity, whilst flawlessly mimicking their old behaviour.

In humans, if I know somebody untrustworthy for a while I am likely to notice something a bit /off/ about them and trust them less. This doesn't hold for AI though I think. They might never slip up- they can project the exact correct persona whilst holding a completely different core value system which I might not know about until a critical juncture, like a sleeper agent- this is something very few humans can do, so I can be much more confident that a human is trustable after building a relationship with them than with an AI agent.
 

I notice they could have just dropped the sandwich as they ran, so it seems that there was a small part of them still valuing the sandwich enough to spend the half second giving it to the brother, in doing so, trading a fraction of a second of niece-drowning-time for the sandwich. Not that any of this decision would have been explicit, system 2 thinking.

Carefully or even leasurely setting the sandwich aside and trading several seconds would be another thing entirely (and might make a good dark comedy skit). 

I'm reminded of a first aid course I took once, where the instructor took pains to point out moments in which the person receiving CPR might be "innapropriate"  if their clothing had ridden up and was exposing them in some way, taking time to cover them up and make them "decent". I couldn't help but be somewhat outraged that this was even a consideration in the man's mind, when somebody's life was at risk. I suppose his perspective was different to mine, given he worked as an emergency responder and the risk of death was quite normalised to him, but he retained his sensibilities around modesty.

And here I was thinking it was a metaphor. Like, they feel literally inflated? If I've been climbing and I'm tired my muscles feel weak, but not inflated. I've never felt that way before.

I've been thinking about this in the back of my mind for a while now. I think it lines up with points Cory Doctorow has made in talks about enshittification. 

I'd like to see recommendation algorithms which are user-editable and preferably platform-agnostic, to allow low switching costs. A situation where people can build their own social media platform and install a recommendation algorithm which works for them, pulling in posts from other users across platforms who they follow. I've heard that the fediverse is trying to do something like this, but I've not been able to get engaged with it yet. 

It's cool to see efforts like Tournesol, though it's a shame they don't have a mobile extension yet.

This is fascinating, and is further evidence to me that LLMs contain models of reality.
I get frustrated with people who say LLMs "just" predict the next token, or they are simply copying and pasting bits of text from their training data. This argument skips over the fact that in order to accurately predict the next token, it's necessary to compress the data in the training set down to something which looks a lot like a mostly accurate model of the world. In other words, if you have a large set of data entangled with reality, then the simplest model which predicts that data looks like reality.

This model of reality can be used to infer things which aren't explicitly in the training data - like distances between places which aren't mentioned together in the training data.

Load More