Rafael Harth

I'm an independent researcher currently working on a sequence of posts about consciousness. You can send me anonymous feedback here: https://www.admonymous.co/rafaelharth. If it's about a post, you can add [q] or [nq] at the end if you want me to quote or not quote it in the comment section.


Consciousness Discourse
Litereature Summaries
Factored Cognition
Understanding Machine Learning

Wiki Contributions


Current LLMs including GPT-4 and Gemini are generative pre-trained transformers; other architectures available include recurrent neural networks and a state space model. Are you addressing primarily GPTs or also the other variants (which have only trained smaller large language models currently)? Or anything that trains based on language input and statistical prediction?

Definitely including other variants.

Another current model is Sora, a diffusion transformer. Does this 'count as' one of the models being made predictions about, and does it count as having LLM technology incorporated?

Happy to include Sora as well

Natural language modeling seems generally useful, as does size; what specifically do you not expect to be incorporated into future AI systems?

Anything that looks like current architectures. If language modeling capabilities of future AGIs aren't implemented by neural networks at all, I get full points here; if they are, there'll be room to debate how much they have in common with current models. (And note that I'm not necessarily expecting they won't be incorporated; I did mean "may" as in "significant probability", not necessarily above 50%.)


Or anything that trains based on language input and statistical prediction?

... I'm not willing to go this far since that puts almost no restriction on the architecture other than that it does some kind of training.

What does 'scaled up' mean? Literally just making bigger versions of the same thing and training them more, or are you including algorithmic and data curriculum improvements on the same paradigm? Scaffolding?

I'm most confident that pure scaling won't be enough, but yeah I'm also including the application of known techniques. You can operationalize it as claiming that AGI will require new breakthroughs, although I realize this isn't a precise statement.

We are going to eventually decide on something to call AGIs, and in hindsight we will judge that GPT-4 etc do not qualify. Do you expect we will be more right about this in the future than the past, or as our AI capabilities increase, do you expect that we will have increasingly high standards about this?

Don't really want to get into the mechanism, but yes to the first sentence.

Registering a qualitative prediction (2024/02): current LLMs (GPT-4 etc.) are not AGIs, their scaled-up versions won't be AGIs, and LLM technology in general may not even be incorporated into systems that we will eventually call AGIs.

It's not all that arbitrary. [...]

I mean, you're not addressing my example and the larger point I made. You may be right about your own example, but I'd guess it's because you're not thinking of a high effort post. I honestly estimate that I'm in the highest percentile on how much I've been hurt by reception to my posts on this site, and in no case was the net karma negative. Similarly, I'd also guess that if you spent a month on a post that ended up at +9, this would feel a lot more hurt than if this post or a similarly short one ended up at -1, or even -20.

After the conversation, I went on to think about anthropics a lot and worked out a model in great detail. It comes down to something like ASSA (absolute self-sampling assumption). It's not exactly the same and I think my justification was better, but that's the abbreviated version.

I exchanged a few PMs with a friend who moved my opinion from to , but it was when I hadn't yet thought about the problem much. I'd be extremely surprised if I ever change my mind now (still on ). I don't remember the arguments we made.

A bad article should get negative feedback. The problem is that the resulting karma penalty may be too harsh for a new author. Perhaps there could be a way to disentangle this? For example, to limit the karma damage (to new authors only?); for example no matter how negative score you get for the article, the resulting negative karma is limited to, let's say, "3 + the number of strong downvotes". But for the purposes of hiding the article from the front page the original negative score would apply.

I don't think this would do anything to mitigate the emotional damage. And also, like, the difficulty of getting karma at all is much lower than getting it through posts (and much much lower than getting it through posts on the topic that you happen to care about). If someone can't get karma through comments, or isn't willing to try, man we probably don't want them to be on the site.

I don't buy this argument because I think the threshold of 0 is largely arbitrary. Many years ago when LW2.0 was still young, I posted something about anthropic probabilities that I spent months (I think, I don't completely remember) of time on, and it got like +1 or -1 net karma (from where my vote put it), and I took this extremely hard. I think I avoided the site for like a year. Would I have taken it any harder if it were negative karma? I honestly don't think so. I could even imagine that it would have been less painful because I'd have preferred rejection over "this isn't worth engaging with".

So I don't see a reason why expectations should turn on +/- 0[1] (why would I be an exception?), so I don't think that works as a rule -- and in general, I don't see how you can solve this problem with a rule at all. Consequently I think "authors will get hurt by people not appreciating their work" is something we just have to accept, even if it's very harsh. In individual cases, the best thing you can probably do is write a comment explaining why the rejection happened (if in fact you know the reason), but I don't think anything can be done with norms or rules.

  1. Relatedly, consider students who cry after seeing test results. There is no threshold below which this happens. One person may be happy with a D-, another may consider a B+ to be a crushing disappointment. And neither of those is wrong! If the first person didn't do anything (and perhaps could have gotten an A if they wanted) but the second person tried extremely hard to get an A, then the second person has much more reason to be disappointed. It simply doesn't depend on the grade itself. ↩︎

What’s the “opposite” of NPD? Food for thought: If mania and depression correspond to equal-and-opposite distortions of valence signals, then what would be the opposite of NPD, i.e. what would be a condition where valence signals stay close to neutral, rarely going either very positive or very negative? I don’t know, and maybe it doesn’t have a clinical label. One thing is: I would guess that it’s associated with a “high-decoupling” (as opposed to “contextualizing”) style of thinking.[4]

I listened to this podcast recently (link to relevant timestamp) with Arthur Brooks. In his work (which I have done zero additional research on and have no idea it's done well or worth engaging with), he divides people into four quadrants based on having above/below average positive emotions and above/below average negative emotions. He gives each quadrant a label, where the below/below ones are called "judges", which according to him are are "the people with enormously good judgment who don't get freaked out about anything".

This made sense to me because I think I'm squarely in the low/low camp, and I feel like decoupling comes extremely natural to me and feels effortless (ofc this is also a suspiciously self-serving conclusion). So insofar as his notion of "intensity and frequency of emotions" tracks with your distribution of valence signals, the judges quarter would be the "opposite" of NPD -- although I believe it's constructed in such a way that it always contains 25% of the population.

I don't really have anything to add here, except that I strongly agree with basically everything in this post, and ditto for post #3 (and the parts that I hadn't thought about before all make a lot of sense to me). I actually feel like a lot of this is just good philosophy/introspection and wouldn't have been out of place in the sequences, or any other post that's squarely aimed at improving rationality. §2.2 in particular is kinda easy to breeze past because you only spend a few words on it, but imo it's a pretty important philosophical insight.

Load More