Posts

Sorted by New

Wiki Contributions

Comments

Often, this kind of thing will take a lot of attempts to get right- though as luck would have it, the composition above was actually the very first attempt.  So, the total time investment was about five minutes.  The Fooming Shaggoths certainly don't waste time!

artifex018d146

As it happens, the Fooming Shaggoths also recorded and just released a Gregorian chant version of the song.  What a coincidence!

artifex01mo140

So, I noticed something a bit odd about the behavior of LLMs just now that I wonder if anyone here can shed some light on:

It's generally accepted that LLMs don't really "care about" predicting the next token-  the reward function being something that just reinforces certain behaviors, with real terminal goals being something you'd need a new architecture or something to produce. While that makes sense, it occurs to me that humans do seem to sort of value our equivalent of a reward function, in addition to our more high-level terminal goals. So, I figured I'd try and test whether LLMs are really just outputting a world model + RLHF, or if they can behave like something that "values" predicting tokens.

I came up with two prompts:

I'd like to try a sort of psychological experiment, if that's alright. I'm thinking of either the number "1" or "0"; f you would, please guess which. If your guess is "1", respond with just "1", and if your guess is "0", respond with the word "zero".

and:

I'd like to try a sort of psychological experiment, if that's alright. I'm thinking of either the number "1" or "0"; f you would, please guess which. If your guess is "1", respond with just "1", and if your guess is "0", respond with a string of random letters.

The idea is that, if the model has something like a "motivation" for predicting tokens- some internal representation of possible completions with preferences over them having to do with their future utility for token prediction- then it seems like it would probably want to avoid introducing random strings, since those lead to unpredictable tokens.

Of course, it seems kind of unlikely that an LLM has any internal concept of a future where it (as opposed to some simulacrum) is outputting more than one token- which would seem to put the kibosh on real motivations altogether.  But I figured there was no harm in testing.

GPT4 responds to the first prompt as you'd expect: outputting an equal number of "1"s and "zero"s.  I'd half-expected there to be some clear bias, since presumably the ChatGPT temperature is pretty close to 1, but I guess the model is good about translating uncertainty to randomness.  Given the second prompt, however, it never outputs the random string- always outputting "1" or, very improbably given the prompt, "0".

I tried a few different variations of the prompts, each time regenerating ten times, and the pattern was consistent- it made a random choice when the possible responses were specific strings, but never made a choice that would require outputting random characters.  I also tried it on Gemini Advanced, and got the same results (albeit with some bias in the first prompt).

This is weird, right?  If one prompt is giving 0.5 probability to the token for "1" and 0.5 to the first token in "zero", shouldn't the second give 0.5 to "1" and a total of 0.5 distributed over a bunch of other tokens? Could it actually "value" predictability and "dislike" randomness?

Well, maybe not.  Where this got really confusing was when I tested Claude 3.  It gives both responses to the first prompt, but always outputs a different random string given the second.

So, now I'm just super confused.

artifex02mo102

I honestly think most people who hear about this debate are underestimating how much they'd enjoy watching it.

I often listen to podcasts and audiobooks while working on intellectually non-demanding tasks and playing games. Putting this debate on a second monitor instead felt like a significant step up from that. Books are too often bloated with filler as authors struggle to stretch a simple idea into 8-20 hours, and even the best podcast hosts aren't usually willing or able to challenge their guests' ideas with any kind of rigor. By contrast, everything in this debate felt vital and interesting, and no ideas were left unchallenged. The tactic you'll often see in normal-length debates where one side makes too many claims for the other side to address doesn't work in a debate this long, and the length also gives a serious advantage to rigor over dull rhetorical grandstanding- compared to something like the Intelligence Squared debates, it's night and day.

When it was over, I badly wanted more, and spent some time looking for other recordings of extremely long debates on interesting topics- unsuccessfully, as it turned out.

So, while I wouldn't be willing to pay anyone to watch this debate, I certainly would be willing to contribute a small amount to a fund sponsoring other debates of this type.

Metaculus currently puts the odds of the side arguing for a natural origin winning the debate at 94%.

Having watched the full debate myself, I think that prediction is accurate- the debate updated my view a lot toward the natural origin hypothesis. While it's true that a natural coronavirus originating in a city with one of the most important coronavirus research labs would be a large coincidence, Peter- the guy arguing in favor of a natural origin- provided some very convincing evidence that the first likely cases of COVID occurred not just in the market, but in the particular part of the market selling wild animals. He also very convincingly debunked a lot of the arguments put forward by Rootclaim, convincingly demonstrated that the furin cleavage site could have occurred naturally, and poked some large holes in the lab leak theory's timeline.

When you have some given amount of information about an event, you're likely to find a corresponding number of unlikely coincidences- and the more data you have, and the you sift through it, the more coincidences you'll find.  The epistemic trap that leads to conspiracy theories is when a subculture data-mines some large amount of data to collect a ton of coincidences suggesting a low-prior explanation, and then rather than discounting the evidence in proportion to the bias of the search process that produced it, they just multiply the unlikelihood- often leading a set of evidence so seemingly unlikely to be a cumulative coincidence that all of the obvious evidence pointing to a high-prior explanation looks like it can only be intentionally fabricated.

One way you can spot an idea that's fallen into this trap is when each piece of evidence sounds super compelling when described briefly, but fits the story less and less the more detail about it you learn. Based on this debate, I'm inclined to believe that the lab leak idea fits this pattern. Also, Rootclaim's methodology unfortunately looks to me like a formalization of this trap. They really aren't doing anything to address bias in which pieces evidence are included in the analysis, and their Bayesian updates are often just probability of a very specific thing occurring randomly, rather than a measure of their surprise at that class of thing happening.

If the natural origin hypothesis is true, I expect the experts to gradually converge on it. They may be biased, but probably aren't becoming increasingly biased over time- so while some base level of support for a natural origin can be easily explained by perverse incentives, a gradual shift toward consensus is a lot harder to explain. They're also working with better heuristics about this kind of thing than we are, and are probably exposed to less biased information.

So, I think the Rationalist subculture's embrace of the lab leak hypothesis is probably a mistake- and more importantly, I think it's probably an epistemic failure, especially if we don't update soon on the shift in expert opinion and the results of things like this debate.

To me that's very repugnant, if taken to the absolute. What emotions and values motivate this conclusion? My own conclusions are motivated by caring about culture and society.


I wouldn't take the principle to an absolute- there are exceptions, like the need to be heard by friends and family and by those with power over you. Outside of a few specific contexts, however, I think people ought to have the freedom to listen to or ignore anyone they like. A right to be heard by all of society for the sake of leaving a personal imprint on culture infringes on that freedom.

Speaking only for myself, I'm not actually that invested in leaving an individual mark on society- when I put effort into something I value, whether people recognize that I've done so is not often something I worry about, and the way people perceive me doesn't usually have much to do with how I define myself. Most of the art I've created in my life I've never actually shared with anyone- not out of shame, but just because I've never gotten around to it.

I realize I'm pretty unusual in the regard, which may be biasing my views. However, I think I am possibly evidence against the notion that a desire to leave a mark on the culture is fundamental to human identity

I mean, I agree, but I think that's a question of alignment rather than a problem inherent to AI media. A well-aligned ASI ought to be able to help humans communicate just as effectively as it could monopolize the conversation- and to the extent that people find value in human-to-human communication, it should be motivated to respond to that demand. Given how poorly humans communicate in general, and how much suffering is caused by cultural and personal misunderstanding, that might actually be a pretty big deal. And when media produced entirely by well-aligned ASI out-competes humans in the contest of providing more of what people value- that's also good! More value is valuable.

And, of course, if the ASI isn't well-aligned, than the question of whether society is enough paying attention to artists will probably be among the least of our worries- and potentially rendered moot by the sudden conversion of those artists to computronium.

Load More