Interesting work! Could this be fixed in training by giving it practice at repeating each token when asked?
Another thing I’ve wondered is how substring operations can work for tokenized text. For example, if you ask for the first letter of a string, it will often get it right. How does that happen, and are there tokens where it doesn’t work?
I think this is a question about markets, like whether people are more likely to buy healthy versus unhealthy food. Clearly, unhealthy food has an enormous market, but healthy food is doing pretty well too.
Porn is common and it seems closer to unhealthy food. Therapy isn’t so common, but that’s partly because it’s expensive, and it’s not like being a therapist is a rare profession.
Are there healthy versus unhealthy social networks? Clearly, some are more unhealthy than others. I suspect it’s in some ways easier to build a business around mostly-healthy cha...
There's an assumption that the text that language models are trained on can be coherently integrated somehow. But the input is a babel of unreliable and contradictory opinions. Training to convincingly imitate any of a bunch of opinions, many of which are false, may not result in a coherent model of the world, but rather a model of a lot of nonsense on the Internet.
I'm wondering who, if anyone, keeps track of throughput at a port? Ideally there would be some kind of graph of containers shipped per day and we could see long-term shipping trends.
(This is making a bad assumption that containers are fungible, but we would at least have a rough idea of how bad the problem is.)
The Port of Long Beach, whose problems instigated the post the OP is responding to, publishes container movements here:
They're up ~150,000 TEUs since early last year (just plotting the "total" column and eyeballing it). IIRC, most containers are 40', so that's somewhere around 75,000 more containers per month. Note that this it both inbound and outbound, full and empty.
There's enough uncertainty in both of those figures I wouldn't take it to the bank, but it shows that shipping volumes have increased at the port.
Could you say anything more specific or concrete about how reading HPMOR changed your life?
HPMOR introduced me to the Sequences (which benefited me as detailed in e.g. Swimming Upstream), and by extension: to LessWrong (and therefore HPMOR must receive some credit for everything I've posted to this site and all of the ideas I've generated), AI risk (now my research area), effective altruism (I just attended EAG in London), CFAR's techniques (Internal Double Crux in particular transformed my internal emotional life) and CFAR's social circle (I'm now polyamorous, which I have found vastly more fulfilling and appropriate than monogamy).
While improvements to moderation are welcome, I suspect it’s even more important to have a common, well-understood goal for the large group of strangers to organize around. For example, Wikipedia did well because the strangers who gathered there already knew what an encyclopedia was.
Tag curation seems a bit like a solution in search of a problem. If we knew what the tags were for, maybe we would be more likely to adopt a tag and try to make a complete collection of things associated with that tag?
Maybe tags (collections of useful articles with something in common) should be created by the researchers who need them? They can be bootstrapped with search. Compare with playlists on YouTube and Spotify.
It seems like a genuinely collaborative project, where articles are intended to be useful and somewhat more evergreen, would probably end up looking something like Wikipedia or perhaps an open source project.
There needs to be some concept of shared goals, a sense of organization and incompleteness, of at least a rough plan with obvious gaps to be filled in. Furthermore, attempts to fill the gaps need to be welcomed.
Wikipedia had the great advantage of previous examples to follow. People already knew what an encyclopedia was supposed to be.
I suspect that at...
I’ve played around with Anki a bit, but never used it seriously because I was never sure what I wanted to memorize, versus look up when needed.
I wonder if it might be better to look at it a different way, using a note-taking tool to leverage forgetting rather than remembering? That is, you could use it to take notes and start reviewing cards more seriously when you’re going to take a test. Afterwards, you might slack off and forget things, but you still have your notes.
After all, we write things down so we don’t have to remember them.
Such a tool would be u...
You talked about rest but have you looked into stretches, putting your wrists in hot and cold water in tubs, ice packs, and so on? I had a different problem (tendonitis) and these helped.
This isn't my area of expertise, but I found this quote in an article about anticipating climate change in the Netherlands to be food for thought:
If we turn the Netherlands into a fort, we will need to build gigantic dikes, but also, and perhaps more importantly, gigantic pumping stations. This is essential, because at some point we will need to pump all of the water from the Rhine, Meuse, Scheldt and Ems – which by that time will be lower than sea level – over those enormous dikes. The energy costs will be higher – but that is no...
Here's a nice introduction to causal inference in a machine learning context:
ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus
Here's an earlier paper by Judea Pearl:
Bayesianism and Causality, or, Why I am Only a Half-Bayesian
Hmm. I don't know anything about Galleani, but wanting to inspire the masses to action via "propaganda of the deed" seems incompatible with directly terrorizing the masses? (Excuses about "collateral damage" aside.)
It seems like this might have something to do with tribalism: who do the terrorists consider "us" versus "them"?
I'm not sure this will help in your case, but the usual framework for using causality for calculations seems to be that you have a DAG respresenting the causal connections between variables (without probabilities) and statistical data. From this, some things can be calculated that couldn't be inferred with statistical data alone.
The cause graph can't usually be inferred from the data. However, some statistical tests could disprove the cause graph. For example, the cause graph might imply that certain statistical variables are independent.
Surveys are really hard to design correctly.
Remember, these were true/false questions, so 50% means no knowledge at all.
This isn't apparent from the data. A score of 50% could mean that nobody knows the answer and everyone is guessing randomly. Or it could mean that 50% of survey-takers know the right answer and 50% mistakenly believe the wrong answer. Or something in between. Without more information, we can't distinguish which is which.
I'd also argue that three of the questions were ambiguous or uncertain:
A chat log is not a simulation because it uses English for all state updates. It’s a story. In a story you’re allowed to add plot twists that wouldn’t have any counterpart in anything we’d consider a simulation (like a video game), and the chatbot may go along with it. There are no rules. It’s Calvinball.
For example, you could redefine the past of the character you’re talking to, by talking about something you did together before. That’s not a valid move in most games.
There are still mysteries about how a language model chooses its next token at inference ... (read more)