Epistemic status: I've had this concept for a while, and applied it in practice many times. It's fairly easy to convey in an argument, and can clear up some confusion. But there are many loose ends, and I'm not sure it's well formulated. Also, I'm stuck, so I'll just throw it out there, hoping that it grows into something.

Drawings and paintings

You are presented with a .bmp file containing a digitalized version of two art pieces. You are told that one is a drawing and the other is a painting. Can you tell which one is which?

On one hand, of course you can't. The .bmp file format doesn't contain information about the history of some clumps of matter on some rock in space. All you have is the color of each pixel (plus some technical stuff). Clearly, the one that looks like a painting could be the result of the artist very carefully drawing every tiny square milimeter of the image so that the digitalized version is equivalent, pixel by pixel, to the .bmp file you are looking at right now...

On the other hand, that's not what happened. The only way doubt can even arise is if the images were adversarially designed for this very purpose. Even then, it'd take a skillful artist and some serious effort to succesfully confuse people, especially other artists.


Some writers prefer to start working on their novel by planning the whole plot before actually putting words on paper, while others just start writing a story and see where it goes. If you haven't encountered this distinction before, you may want to pause for a few seconds to figure out how it will affect the sort of book that results.

Stories that are written without a pre-planned plot tend to be more locally consistent. The characters' decisions are not constrained by where the plot needs to go, so they will be better aligned with their personalities, motivations, and pasts. Consequences follow more clearly from causes, because the causes aren't backwards-engineered to fit the pre-determined consequences. But you'll get a meandering story that doesn't seem to move towards a conclusion. A good example is A Song of Ice and Fire. Stories that are written with a clear outline planned in advance feel more coherent as a whole, have a conclusion, and plot twists make sense. However, characters may at times act in ways that make little sense just to keep the plot going. An example is the Harry Potter series.

But you could just write the exact same words with the other method, right?

Bottom lines

You are considering an argument. Did the person making it draw the conclusion from the argument, or did they tailor the argument to fit their pet conclusion?

In theory, you can come up with the same argument either way. In practice though...


So your friend has decided to decorate an empty wall in their room with two images of board states of Go after 50 moves. One, apparently, was produced by your friend downloading some Go software and throwing stones on the board against an AI. The other one is a game between two professionals. There is a clear difference between the two, even functionally: Go players visiting that room will draw aesthetic pleasure from one but not the other.

Of course there is nothing stopping your friend from playing moves indistinguishable from those of a top player. In practice, though, he can't. What's more, I think it's impossible for a professional to mimic the play of a beginner, even with access to a source of randomness. And yet, if you aren't a Go player, to you the two may look fundamentally no different. Telling the difference requires at least a little expertise.

Programming languages

You are presented two long lists of numbers. One is a list of prime numbers generated by an algorithm in Python, the other one by an algorithm in C. Of course, anyone who is an expert on prime numbers will be able to tell them apart.

Utility functions

You are casually walking in a forest when you suddenly notice an agent. You don't know how the agent works, only its actions. Can you tell if the agent is internally optimizing for some utility function?

Of course, any agent has some utility function, e.g. the one that assigns 42 utilons to it doing precisely the thing it ends up doing at precisely the time it ends up doing that, and 0 otherwise.

And yet you can tell that the rock you stumbled into isn't in fact internally optimizing for a utility function.

Bit sequences

Your friend has decided to decorate a wall in their room with two sequences of 1s and 0s. One was generated by taking a 99-bit random number, plus a 1-bit checksum. The other one was generated the same way, except your friend later flipped a bit. You can of course tell the difference with a bit of expertise (no pun intended), but then the wall is rather ugly in both places and that pretty much sums it all up (no pun intended).


So let's say you read two claims on Twitter. For the sake of simplicity let's assume that you have absolutely no way to fact-check either of them, but somehow you know for sure that precisely one of them is true. One of them claims that hackers have hacked Russian spy satellites, the other one claims that hackers have access to the "phone directory of the military prosecutor's office of the southern military district of Russia". (Disclaimer: this hypothetical scenario only partially reflects reality.)

Fictitious claims are usually typical and generic in a way true claims aren't, and also tend to be better aligned with the interests of the person making the claim.

This understanding is something I apply extensively, and not only to identify lies. You see, there is a large demand for fictional stories presented as real in the form of novels, movies, video games, etc. In particular, I realized that there is a noticeable difference between works set in an already existing fictional world established by earlier worldbuilding, and works where the worldbuilding is done specifically to support the work in question. You probably need some expertise to notice the difference, but in my experience it's very relevant for creating immersion.

But both methods just result in a story, a sequence of words, right?

Typing or handwriting

Did I write and edit this post by typing on a keyboard, or did I copy it over from the original pen-on-paper version? Sure, I could in theory have done the latter and ended up with these very words, but I changed this sentence alone multiple times already, and I'm not even sure how I'm going to finish it yet, but it's getting disturbingly self-reflective at this point, so I'd rather finish it already; so anyway, would I really have written precisely this sentence if I hadn't been able to conveniently edit it on the fly and instead had to think the whole thing through before writing the first word? Is this difference relevant for any practical purpose?

Armchair philosophy

You are reading an essay about an interesting concept. Was it conceived by a person sitting in an armchair, rubbing their chin, or someone taking a walk in a beautiful park?

There are Many People™ with an implicit assumption that Coming to Correct Conclusions is just a sequence of Thinking Thoughts, and since Thinking Thoughts happens in the brain and you have a brain while sitting in an armchair too, you could just draw the exact same conclusions there as you would in a park.

Yes, in theory, you could.

Reading books reportedly has the effect that the reader forms models of the world they would't otherwise form. Appreciating art can also spark insights, even when the art has no propositional content. Being tortured is not conducive to creating accurate high-level maps. Some funny people even make some funny claims like that sitting in silence for a while is necessary to understand certain aspects of the world.

In a park, light conditions are going to be different, which affects alertness. Walking affects blood flow to the brain. Context-dependent memory is a thing.

Of course, I couldn't tell the difference just by reading the resulting essay. Maybe an expert on armchair-sitting and park-walking could. Maybe a strong enough classifier AI could. But by default, in the absence of a strong argument otherwise, I'll assume that the effect is significant in some direction, and relevant along at least some reasonable values humans might care about.


Your friend recounts the contents of the latest conversation they had with a stranger on the internet. Was it a conversation limited to 280 character long units of transmission, or an exchange of longform essays, or a voice chat?

What would happen to your conversations with your friends if you introduced arbitrary rules to them? E.g. (babble):

  • every time you say something, your friend has to explain in their own words what they think you said,
  • the order in which participants are allowed to speak is fixed, and so those less confident are also forced to contribute to the conversation,
  • whenever an interesting point comes up, it's written down, and you keep returning to these points until there is nothing else left to say about them.

I know Many People™ (myself included) have an aversion to introducing arbitrary rules like that. Why not just let everyone talk whenever they feel like, and say whatever they feel like? But of course no one wants that: most people would agree that we need arbitrary rules such as "don't talk over each other", or that moderation is required in online spaces. Also, you can't avoid arbitrary rules, you can only decide between letting them be imposed by the environment in an accidental manner or consciously choosing them.

In theory, none of this should matter. In theory, it's possible to have intelligent discussion on Twitter.

Final Thoughts

The method you use to create a product leaves its mark on it. The fact that the set of possible products two different methods can create is the same does not mean that the choice of your method is just a detail of implementation. Sometimes there is no difference at all, or no significant difference, or the difference is significant but mostly irrelevant relative to your goals, but you shouldn't assume this by default, without some strong arguments.

The power of this concept comes from being able to recognize it at work in various aspects of your life. This is an excellent topic to do your own babble on.

Thanks to Justis for the feedback that helped me articulate my thoughts.


New to LessWrong?

New Comment
23 comments, sorted by Click to highlight new comments since: Today at 7:10 AM

How can you tell the difference between a list of primes generated in C or Python? Or is that just trolling?

The list generated by C is longer than the one generated by python?

That makes way too many assumptions. Maybe the longer one was generated by a more efficient python library, or I ran the python program for longer.

My guess would be trolling. 

(Now random numbers generated in C vs Python, that one you can probably tell apart with enough effort).

Would it be fair to describe this concept abstractly as follows:

"Two process can theoretically generate exactly the same set of objects, but with different probabilities. Therefore, when presented with a given object, one can use Bayes-theorem to infer which process generated it."


You draw an element at random from distribution A.

Or you draw an element at random from distribution B.

The range of the distributions is the same, so anything you draw from B could have been drawn from A. And yet...

The range of the distributions is the same, so anything you draw from B could have been drawn from A

This does not hold in pathological cases, I don't think. As one example:

(I can't be bothered to figure out the correct constants to normalize these. Should be fairly straightforward to calculate from the standard infinite-geometric-series-sum formula.)

Both of these functions have the same range, , but there is no overlap between the two functions. Their support is disjoint.

That's a function, he was referring to a distribution

They are both (un-normalized) probability density functions, as per the names  and . My apologies if that was unclear.

To be somewhat clearer: I was referring to the probability distributions described by these two probability density functions. They have the same range, but disjoint support, and so anything you drew from B could not have been drawn from A.

In other words, a (pathological) counterexample to "The range of the distributions is the same, so anything you draw from B could have been drawn from A".

Wikipedia says:

In mathematics, the range of a function may refer to either of two closely related concepts: The codomain of the function; The image of the function.

I meant the image. At least that's what you call it for a function; I don't know the terminology for distributions. Honestly I wasn't thinking much about the word "range", and should have simply said:

Anything you draw from B could have been drawn from A. And yet...

Before anyone starts on about how this statement isn't well defined because the probability that you select any particular value from a continuous distribution, I'll point out that I've never seen anyone draw a real number uniformly at random between 0 and 1 from a hat. Even if you are actually selecting from a continuous distribution, the observations we can make about it are finite, so the relevant probabilities are all finite.

I was assuming you meant range as in the statistical term (for a distribution, roughly, the maximum  for which , minus the minimum  for which ).

Annoyingly, this is closer to the domain than it is the range, in function terminology.

I meant the image.

Are you sure? The range is a description of the possible outputs of the pdf, which means... almost nothing. Trivial counterexample if you do mean image:

Uniform distribution A between 0 and 0.5 (that is, 2 for 0..0.5, and 0 otherwise).

Uniform distribution B between 1.0 and 1.5 (that is, 2 for 1.0..1.5, and 0 otherwise).

Both of these distributions have the same image {0, 2}. And yet they are disjoint.

Honestly I wasn't thinking much about the word "range", and should have simply said:
> Anything you draw from B could have been drawn from A.

There are many probability distributions where this is not the case. (Like the two uniform distributions A and B I give in this post.)


Oh. You said you don't know the terminology for distributions. Is it possible you're under a misunderstanding of what a distribution is? It's an "input" of a possible result, and an "output" of how probable that result is[1]. The output is not a result. The input is.

  1. ^

    ...to way oversimplify, especially for continuous distributions.

Oh. You said you don’t know the terminology for distributions. Is it possible you’re under a misunderstanding of what a distribution is? It’s an “input” of a possible result, and an “output” of how probable that result is.

Yup, it was that. I thought "possible values of the distribution", and my brain output "range, like in functions". I shall endeavor not to use a technical term when I don't mean it or need it, because wow was this a tangent.

If I may ask, why didn't you use the following (simpler imo) example: pmf_A(0) = 1 pmf_A(1) = 0 pmf_B(0) = 0 pmf_B(1) = 1

Or even the "Bit sequences" part of the post?

If I may ask, why didn't you use the following (simpler imo) example: pmf_A(0) = 1 pmf_A(1) = 0 pmf_B(0) = 0 pmf_B(1) = 1

With that approach one can argue that the two PMFs have different ranges[1], and get rabbit-holed into a discussion of e.g. "is a uniform distribution from 0 to 1 with a range of -10 to 10 the same or different than a uniform distribution from 0 to 1 with a range of 0 to 1".

This approach is more complex, but sidesteps that.

  1. ^

What about


Both functions' support has the same minimum (0) and maximum (2).

I really enjoyed your use of obviously incorrect examples to draw the reader into a deeper understanding of the issue. If you don't mind, I might steal that technique for myself ;)

Now I'm curious which of the examples were incorrect according to you. For me:

  • Programming languages (though I took it as a joke about python being slow)
  • Utility function: When someone comes across a rock, one knows that it is not optimizing for an utility function not from its actions but rather from preexisting knowledge about its (lack of) inner mechanics. Swap out the common rock for a rock magically looking exactly like a human standing still. Now, it's much less obvious from a short observation that it isn't optimising for an utility function, even though it does the same actions.

The other examples seemed correct to me.

My mind immediately went to the difficulty of excluding crystal growth from definitions of life...

Programming languages: If they were written idiomatically and quickly, you can absolutely tell the difference between a list of primes generated by a Python vs C program. Hint: Python and C have different default numeric types.

Reminds me of the fact that kolmogorov complexity depends on the model of computation: two turing-complete systems can do the same set of things, but it may be far simpler to do something in one than the other.


In a video by web developer Joshua Morony, he explains why he won’t rely on GPT-4 for his code, despite its efficiency. Me paraphrasing: In coding, knowing why and how a system was designed, edge cases accounted for, etc. is valuable because it contextualizes future decision-making. But if you allow AI to design your code, you’ll lose that knowledge. And if you plan on prompting AI to develop further, you’ll lack the judgment to direct it most coherently.


Many have written about how it’s the writing process itself that generates the great ideas comprising a final product. I’ve briefly consolidated those ideas before. This would warn against writing essays with ChatGPT. But it would also apply to, e.g., hiring an editor to compile a book from your own notes, rather than writing it yourself. Outsourcing the labour of selecting words, ordering ideas etc. also offloads the opportunity to generate new ideas and fascinating relationships between them -- often much greater ideas than the ones you started with. Of course, your editor may notice all this, but the principal-agent problem may prevent them from working beyond their scope. Even if they went above and beyond, the arrangement still depends on you not caring that the ideas in the book aren’t your own, or that you don’t even understand the ideas in your own book, since you aren’t the one who laboured to earn an intuition about them.

Automating some work is fine. I ran both the above paragraphs through ChatGPT for brevity. But even then, I wrote them beforehand myself, which generated many ideas that I didn’t yet have at the outset of writing, for example the principal-agent consideration. I even ran this paragraph through ChatGPT, but chose not to adopt its summary. And in the process of rewriting it, I thought to include the principal-agent example two sentences ago. If I had just asked ChatGPT (or some editor) to write a decent response to this post, even with general direction, I doubt it would have been as thoughtful as I was. But of course, ChatGPT or an editor could have come up with a comment exactly as thoughtful, right?

Thank you for writing this up! It does, indeed, incite a bit of babble from my brain.

It seems to me like you are pointing at a cluster of several related concepts, though perhaps that's because their underlying pattern of similarity is one of those things that seems obvious upon examination despite being non-obvious if one neglects to seek it out. Or perhaps I'm missing an interim level of commonality.

Drawings and paintings

If you spend much time in art software, you'll notice that some digital brushes act like pens or pencils, others act like paintbrushes, and yet others act like stamps. The difference is not in the color that each tool applies to the virtual canvas, but rather in the interface between where the color was applied and where it wasn't. I can reach the same arrangement of pixels with either tool (this is easiest to prove using an extremely small canvas and extremely limited color palette), but some arrangements are easier to reach with one tool, and some are easier to reach with another.

We tend to assume that a piece of art was created with the tool that's easiest to reach it with, but sometimes part of the artistic performance is intentionally choosing a more difficult tool to get to a similar result. Consider those who pour weeks into photorealistic drawing and painting instead of spending mere minutes or hours to get an indistinguishably similar result with a camera.

And yet you can tell that the rock you stumbled into isn't in fact internally optimizing for a utility function.

Does the set of utility functions have no identity element? The rock's purpose is to be a rock, and it's probably doing that quite well... If the thing you stumbled into is actually a log or a clod of dirt, it would be objectively bad at being a rock, and you might notice the difference when you notice that the log can float or burn, and the dirt can crumble easily. "Wow, that thing I thought was a rock is behaving in such an un-rock-like manner that I'd better re-evaluate it!". Sure, a rock of volcanic pumice might float and feel very light for its volume, but I could consistently call it "not very good at being a rock" if I tried to use it for any purpose that requires a rock to be dense and heavy. The idea of what it means for something to be a rock comes from the observer, not the rock itself, and so the idea of the rock exists in a system that tends to like attributing some level of agency to everything it simulates.

What a rock does is follow the laws of physics, and that's ultimately what everything else does too, although the rock has fewer moving parts, and we rarely look at rocks in a way that inspires us to model them as making choices. (counterexample: "my granite countertop decided to crack last week; it must not have liked the way the supports were installed under it")

The method you use to create a product leaves its mark on it.

I read this as "we can very easily confuse different products for being the same thing".

This is most obvious in your programming example: If the same prime-finding algorithm was implemented in Python and C, and you didn't tell the observer how long each program had taken to run, they would not be able to tell the two lists apart, because it'd be the same list. If the output from the same algorithm differed, the difference would be proof that you had erred in implementing it, and thus the algorithm wasn't the same at all. It's only when you start changing the algorithm to suit the strengths of each language that you might get different outputs.