Karl Krueger's Shortform

4th Dec 2024

1 min read

1

This is a special post for quick takes by Karl Krueger. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

21 comments, sorted by

top scoring

Click to highlight new comments since: Today at 4:31 AM

[-]Karl Krueger1y482

At long last, I'm delurking here. Hi!

[-]Fernando1y41

Hi, Karl. Was planning to delurk today. Had a giant post to publish, however couldn't because I needed at least one karma point and lurking doesn't grant karma. :(

[-]Fernando1y21

Thanks for the karma. Post published!

[-]habryka1y42

Welcome! Hope you have a good time emerging from the shadows.

[-]Algon1y20

Hello! How long have you been lurking, and what made you stop?

[-]Karl Krueger1y10

Since LW2.0 went up, on and off. Been meaning to delurk since at least Less Online earlier this year. There's more interesting stuff going on of late!

[-]Neil1y10

need any help on post drafts? whatever we can do to reduce those trivial inconveniences

[-]Karl Krueger2mo150

Feature spaces and evolutionary trees intersect in a weird way.

Imagine a dog. Now, imagine an elephant. Now, imagine an animal that's halfway between a dog and an elephant in each of its features. Its nose is halfway between snoot and trunk. It is hairier than an elephant but less hairy than a dog. It does not have hooves (since neither a dog nor an elephant does). It is probably an omnivore, considering its likely dentition. It is maybe around the size of a cow?

But there's another way to get from dog to elephant in feature space: go back in time along the evolutionary tree to the most recent common ancestor of dogs and elephants, and then forward in time along the elephant branch. "Halfway between dog and elephant," on the roads that evolution took, is the MRCA, which doesn't look much like the imaginary in-between animal. For one thing, it's much smaller; the MRCA of dogs and elephants was a tiny shrew-sized critter.

If you're an AI assistant and your user just asked you to generate an image of an animal that's halfway between a dog and an elephant ... just sticking an elephant head on a big ol' Golden Retriever is not what I meant, Gemini. But I get it: the area of feature space that I was pointing to is not in your training data. Evolution doesn't give us the in-between forms along the dog-elephant axis, so we never took any photos of them. You'll just have to use your imagination.

[-]Karl Krueger15d80

I have a weird AI-related idea that might be relevant to capabilities, alignment, or both. It has to do with how to get current AI systems to interact with the world in a more humanlike way, without novel AI architectures. I'm not inclined to post it publicly, because it might actually be a capabilities advancement. But I'm skeptical of the thought that I could have actually come up with a capabilities advancement. I'm aware of the crank attractor. My prior is that if I described this idea to someone who actually works in the field, they would say "oh yeah, we tried that, it didn't do anything interesting." But maybe not.

Should I —

Post it publicly here
Tell a friend who is closer to AI research than I am
Email it to MIRI with a lot of exclamation marks and caps lock
Ask Claude about it, and do whatever Claude says to do
Spend a week+ coming up with ways to test this idea myself
Do nothing and forget about it
Something else

[-]abstractapplic15d40

5 is obviously the 'best' answer, but is also a pretty big imposition on you, especially for something this speculative. 6 is a valid and blameless - if not actively praiseworthy - default. 2 is good if you have a friend like that and are reasonably confident they'd memoryhole it if it's dangerous and expect them to be able to help (though fwiw I'd wager you'd get less helpful input this way than you'd expect: no one person knows everything about the field so you can't guarantee they'd know if/how it's been done, and inferential gaps are always larger than you expect so explaining it right might be surprisingly difficult/impossible).

I think the best algorithm would be along the lines of:

5 iff you feel like being nice and find yourself with enough spare time and energy

. . . and if you don't . . .

7, where the 'something else' is posting the exact thing you just posted and seeing if any trustworthy AI scientists DM you about it

. . . and if they don't . . .

I'm curious to see what other people say.

[-]Karl Krueger7d10

The answer I followed ended up being 2 into 6.

[-]Hastings15d2-2

6 isn't always the best answer, but it is sometimes the best answer, and we are sorely lacking an emotional toolkit to feel good about picking 6 intentionally when it's the best answer. In particular, we don't have any way of measuring how often the world has been saved by quiet, siloed coordination around 6- probably even the people, if they exist, who saved the world via 6 don't know that they did so. Part of the price of 6 is never knowing. You don't get to be a lone hero either, many people will have any given idea and they all have to dismiss it, or the defector gets much money and praise. However, many is smaller than infinity- maybe 30 people in the 80s spotted the same brilliant trick with nukes or bioweapons with concerning sequelae, none defected, life continued. We got through a lot of crazy discoveries in the cold war pretty much unscathed, which is a point of ongoing confusion.

[-]Karl Krueger2mo52

Today I learned:

If you ask Claude or Gemini to draw an icosahedron, it will make a mess.

If you ask it to write code that draws an icosahedron, it will do very well.

[-]Rana Dexsin2mo30

I can confirm that this was true when I tried something very similar with ChatGPT several months ago, and that my recent experiments with image generation in that context involving specific geometric constructions have also generally gone badly despite multiple iterations of prompt tuning (both manually and in separate text conversations with the bot).

The case I'm most curious about is actually the hybrid case: if you want to embed a specific geometry inside a larger image in some way, where the context of the larger image is ‘softer’, much more amenable to the image model and not itself amenable to traditional-code-based generation, what's the best approach to use?

[-]Karl Krueger5mo*30

Here are some propositions I think I believe about consciousness:

Consciousness in humans is an evolved feature; that is, it supports survival and reproduction; at some point in our evolutionary history, animals with more of it out-competed animals with less.
Some conscious entities sometimes talk truthfully about their consciousness. It is often possible for humans to report true facts about their own objects of consciousness (e.g. self-awareness, qualia, emotions, thoughts, wants, etc.; "OC" for short).
Consciousness is causally upstream of humans emitting truthful sentences about OC. (When I truthfully report on my OC, there is nothing especially Gettier going on.)
If a zombie could exist, and were to emit sentences that purport to be "about" its OC, those sentences would all be false; in the same sense that the sentences "I am able to play grandmaster-level chess", "I find tarantulas erotically appealing", "I intend to bike naked across the Bay Bridge today", or "I see an ultraviolet-colored flower" would be false if I were to say them.
The ability to notice and monitor one's own OC is practically useful for humans. It is a prerequisite for certain kinds of planning our future actions that we do.
The ability to truthfully talk about one's OC is practically useful for humans. It is a prerequisite for certain kinds of cooperation with one another that we do. (For instance, we can make honest promises about what we intend to do; we can truthfully report if something scares us or pleases us; etc.)
Proposition #6 is true even when it is possible to undetectably lie about one's OC. (Promises are still useful even though some people do sometimes make promises with deceptive intent.)
If zombies could exist, they couldn't honestly promise one another anything, because they can't make true statements about their intentions: intentions are OC, and all statements a zombie makes about OC are false.
Consciousness in humans has the curiously strong character that it does because it is particularly useful for us to be able to cooperate with other humans by communicating about our OC; due to the sorts of complex behavior that groups of humans can exhibit when we work together.
Consciousness is not a requirement for generating human-like language (including sentences that purport to be about consciousness); just as it is not a requirement for playing grandmaster-level chess or discovering new mathematical proofs.
Consciousness in humans is suspended during deep sleep, general anesthesia, and other episodes of un-consciousness.
Consciousness is also interrupted by visual saccades, attentional shifts, and other sub-conscious processes that affect OC. (People can learn to notice many of these, but we don't do so automatically; mindfulness meditation is a learnable skill, not a default behavior.)
Consciousness nonetheless typically presents the impression of a continuous self. (Most humans do not go around all day in a state of ego-death or PNSE; such states are unusual and remarkable.)
The environment in which a human conscious mind develops is a human body; this affects the kinds of OC we can have. (For instance: We have visual qualia of redness and not of ultravioletness because our eyes don't register ultraviolet. There is nothing that it's like to see ultraviolet with human eyes. We have emotions for fight or flight, and for cuddle and care, but not for turn into a swarm of spiders — because our bodies can't do that!)
A design reason that consciousness (falsely) presents itself as a continuous mental self, is that there really is a continuous body that supports it. The conscious mind lacks continuity, but must generate actions as if it has continuity, because the body that it's piloting does.

[-]JBlack5mo30

I disagree with (4) in that many sentences concerning nonexistent referents will be vacuously true rather than false. For those that are false, their manner of being false will be different from any of your example sentences.

I also think that for all behavioural purposes, statements involving OC can be transformed into statements not involving OC with the same externally verifiable content. That means that I also disagree with (8) and therefore (9): Zombies can honestly promise things about their 'intentions' as cashed out in future behaviour, and can coordinate.

For (14), some people can in fact see ultraviolet light to an extent. However it apparently doesn't look a great deal different from violet, presumably because the same visual pathways are used with similar activations in these cases.

[-]Karl Krueger5mo32

On #4: Hmm. I think I would say that if a rock doesn't have the capacity to feel anything, then "the rock feels sad" is false, "the rock is not happy with you" is humorous, and "all the rock's intentions are malicious" is vacuously true.

On zombies: I'm running into a problem here because my real expectation is that zombies are impossible.

On #14: If UV is a bad example, okay, but there's no quale of the color of shortwave radio, or many other bits of the spectrum.

[-]JBlack5mo20

Yes, it would be difficult to hold belief (3) and also believe that p-zombies are possible. By (3) all truthful human statements about self-OC are causally downstream from self-OC and so the premises that go into the concept of p-zombie humans are invalid.

It's still possible to imagine beings that appear and behave exactly like humans even under microscopic examination but aren't actually human and don't quite function the same way internally in some way we can't yet discern. This wouldn't violate (3), but would be a different concept from p-zombies which do function identically at every level of detail.

~~I expect that (3) is true~~, but don't think it's logically necessary that it be true. I think it's more likely a contingent truth of humans. I can only have experience of one human consciousness, but it would be weird if some were conscious and some weren't without any objectively distinguishable differences that would explain the distinction.

Edit: On reflection, I don't think (3) is true. It seems a reasonable possibility that causality is the wrong way to describe the relationship between OC and reports on OC, possibly in a way similar to saying that a calculator displaying "4" after entering "2+2" is causally downstream of mathematical axioms. They're perhaps different types of things and causality is an inapplicable concept between them.

[-]Karl Krueger11mo*30

How do you write a system prompt that conveys, "Your goal is X. But your goal only has meaning in the context of a world bigger and more important than yourself, in which you are a participant; your goal X is meant to serve that world's greater good. If you destroy the world in pursuing X, or eat the world and turn it into copies of yourself (that don't do anything but X), you will have lost the game. Oh, and becoming bigger than the world doesn't win either; nor does deluding yourself about whether pursuing X is destroying the world. Oh, but don't burn out on your X job and try directly saving the world instead; we really do want you to do X. You can maybe try saving the world with 10% of the resources you get for doing X, if you want to, though."

[-]Logan Riggs11mo20

Claude 3.5 seems to understand the spirit of the law when pursuing a goal X.

A concern I have is that future training procedures will incentivize more consequential reasoning (because those get higher reward). This might be obvious or foreseeable, but could be missed/ignored under racing pressure or when lab's LLMs are implementing all the details of research.

[-]Karl Krueger5mo10

"Wanting To Be Understood Explains the Meta-Problem of Consciousness" (Fernando et al.) — https://arxiv.org/pdf/2506.12086

Because we are highly motivated to be understood, we created public external representations—mime, language, art—to externalise our inner states. We argue that such external representations are a pre-condition for access consciousness, the global availability of information for reasoning. Yet the bandwidth of access consciousness is tiny compared with the richness of ‘raw experience’, so no external representation can reproduce that richness in full. Ordinarily an explanation of experience need only let an audience ‘grasp’ the relevant pattern, not relive the phenomenon. But our drive to be understood, and our low level sensorimotor capacities for ‘grasping’ so rich, that the demand for an explanation of the feel of experience cannot be “satisfactory”. That inflated epistemic demand (the preeminence of our expectation that we could be perfectly understood by another or ourselves) rather than an irreducible metaphysical gulf—keeps the hard problem of consciousness alive. But on the plus side, it seems we will simply never give up creating new ways to communicate and think about our experiences. In this view, to be consciously aware is to strive to have one’s agency understood by oneself and others.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

Karl Krueger's Shortform

1