A chat log is not a simulation because it uses English for all state updates. It’s a story. In a story you’re allowed to add plot twists that wouldn’t have any counterpart in anything we’d consider a simulation (like a video game), and the chatbot may go along with it. There are no rules. It’s Calvinball.

For example, you could redefine the past of the character you’re talking to, by talking about something you did together before. That’s not a valid move in most games.

There are still mysteries about how a language model chooses its next token at inference

Interesting work! Could this be fixed in training by giving it practice at repeating each token when asked?

Another thing I’ve wondered is how substring operations can work for tokenized text. For example, if you ask for the first letter of a string, it will often get it right. How does that happen, and are there tokens where it doesn’t work?

You could do this, if you wanted. I suspect that when ChatGPT was patched, they instead just patched the tokenizer to no longer create these tokens, which is significantly easier and would also allow the model to repeat them without too much trouble. I think that substring operations would mainly work with tokens that are used a fair bit. My model of the situation is, there is some loss that it would leave on the table if it didn't know some facts about substrings of common tokens, so it learns it. For instance, it would help it be able to complete more acronyms, and if people prefer or avoid alliteration in certain contexts, it would help to predict text. If it was trained on social media, sometimes people will spell things out in ALL CAPITAL LETTERS, or do iNtErCaPs or whatever you call that, which would let it know all sorts of facts about the innards of tokens.

I think this is a question about markets, like whether people are more likely to buy healthy versus unhealthy food. Clearly, unhealthy food has an enormous market, but healthy food is doing pretty well too.

Porn is common and it seems closer to unhealthy food. Therapy isn’t so common, but that’s partly because it’s expensive, and it’s not like being a therapist is a rare profession.

Are there healthy versus unhealthy social networks? Clearly, some are more unhealthy than others. I suspect it's in some ways easier to build a business around mostly-healthy cha

There's an assumption that the text that language models are trained on can be coherently integrated somehow. But the input is a babel of unreliable and contradictory opinions. Training to convincingly imitate any of a bunch of opinions, many of which are false, may not result in a coherent model of the world, but rather a model of a lot of nonsense on the Internet.

Do you have much actual experience playing around with large language models? In my experience, the knowledge/world model of GPT-3/ChatGPT are coherently integrated.   This seems empirically false to my experience using language models, and prima facie unlikely. Lots of text on the internet is just reporting about underlying reality: * Log files * Research papers * Academic and industry reports * Etc. Learning to predict such reports of reality, would privilege processes that can learn the structure of reality.   Furthermore, text that is fact and text that is fiction is often distinguished in writing style or presentation. In my experience, large language models do not conflate fact or fiction.

I'm wondering who, if anyone, keeps track of throughput at a port? Ideally there would be some kind of graph of containers shipped per day and we could see long-term shipping trends.

(This is making a bad assumption that containers are fungible, but we would at least have a rough idea of how bad the problem is.)

The Port of Long Beach, whose problems instigated the post the OP is responding to, publishes container movements here:

They're up ~150,000 TEUs since early last year (just plotting the "total" column and eyeballing it).  IIRC, most containers are 40', so that's somewhere around 75,000 more containers per month.  Note that this it both inbound and outbound, full and empty.

There's enough uncertainty in both of those figures I wouldn't take it to the bank, but it shows that shipping volumes have increased at the port.

Could you say anything more specific or concrete about how reading HPMOR changed your life?

Another example: HPMOR inspired me to be more scholarly -> Thinking of HarryHPMOR, in 2017 I only ask for science books for Christmas -> I buy and read Superintelligence -> {I work on AI risk, I read (text)books regularly []}.


HPMOR introduced me to the Sequences (which benefited me as detailed in e.g. Swimming Upstream), and by extension: to LessWrong (and therefore HPMOR must receive some credit for everything I've posted to this site and all of the ideas I've generated), AI risk (now my research area), effective altruism (I just attended EAG in London), CFAR's techniques (Internal Double Crux in particular transformed my internal emotional life) and CFAR's social circle (I'm now polyamorous, which I have found vastly more fulfilling and appropriate than monogamy). 

How

While improvements to moderation are welcome, I suspect it’s even more important to have a common, well-understood goal for the large group of strangers to organize around. For example, Wikipedia did well because the strangers who gathered there already knew what an encyclopedia was.

Tag curation seems a bit like a solution in search of a problem. If we knew what the tags were for, maybe we would be more likely to adopt a tag and try to make a complete collection of things associated with that tag?

Maybe tags (collections of useful articles with something in common) should be created by the researchers who need them? They can be bootstrapped with search. Compare with playlists on YouTube and Spotify.

It seems like a genuinely collaborative project, where articles are intended to be useful and somewhat more evergreen, would probably end up looking something like Wikipedia or perhaps an open source project.

There needs to be some concept of shared goals, a sense of organization and incompleteness, of at least a rough plan with obvious gaps to be filled in. Furthermore, attempts to fill the gaps need to be welcomed.

Wikipedia had the great advantage of previous examples to follow. People already knew what an encyclopedia was supposed to be.

I suspect that at

Wikipedia revolves around it's concept of consensus. A system where changes are voted upon and there's no need to have discussions to come to a consensus will have substantially different dynamics and not simply be another Wikipedia. 

I’ve played around with Anki a bit, but never used it seriously because I was never sure what I wanted to memorize, versus look up when needed.

I wonder if it might be better to look at it a different way, using a note-taking tool to leverage forgetting rather than remembering? That is, you could use it to take notes and start reviewing cards more seriously when you’re going to take a test. Afterwards, you might slack off and forget things, but you still have your notes.

After all, we write things down so we don’t have to remember them.

Such a tool would be u

You talked about rest but have you looked into stretches, putting your wrists in hot and cold water in tubs, ice packs, and so on? I had a different problem (tendonitis) and these helped.

I've tried each of those, and I don't feel like any of them helped though it's hard to tell. I think maybe stretches made it worse?

This isn't my area of expertise, but I found this quote in an article about anticipating climate change in the Netherlands to be food for thought:

If we turn the Netherlands into a fort, we will need to build gigantic dikes, but also, and perhaps more importantly, gigantic pumping stations. This is essential, because at some point we will need to pump all of the water from the Rhine, Meuse, Scheldt and Ems – which by that time will be lower than sea level – over those enormous dikes. The energy costs will be higher – but that is no
... (read more)

Here's an earlier paper by Judea Pearl:

Bayesianism and Causality, or, Why I am Only a Half-Bayesian

Hmm. I don't know anything about Galleani, but wanting to inspire the masses to action via "propaganda of the deed" seems incompatible with directly terrorizing the masses? (Excuses about "collateral damage" aside.)

It seems like this might have something to do with tribalism: who do the terrorists consider "us" versus "them"?

I'm not sure this will help in your case, but the usual framework for using causality for calculations seems to be that you have a DAG respresenting the causal connections between variables (without probabilities) and statistical data. From this, some things can be calculated that couldn't be inferred with statistical data alone.

The cause graph can't usually be inferred from the data. However, some statistical tests could disprove the cause graph. For example, the cause graph might imply that certain statistical variables are independent.

Surveys are really hard to design correctly.

Remember, these were true/false questions, so 50% means no knowledge at all.

This isn't apparent from the data. A score of 50% could mean that nobody knows the answer and everyone is guessing randomly. Or it could mean that 50% of survey-takers know the right answer and 50% mistakenly believe the wrong answer. Or something in between. Without more information, we can't distinguish which is which.

I'd also argue that three of the questions were ambiguous or uncertain:

  • Does the big bang really count as
... (read more)