I think it's worth noting that I also have had times where I was impressed with your tact. The two examples that jump to mind are 1) a tweet where you gently questioned Nate Silver's position that expressing probabilities as frequencies instead of percentages was net harmful, and 2) your "shut it all down" letter to NYT, especially the part where you talk about being positively surprised by the sanity of people outside the industry and the text about Nina losing a tooth. Both of those struck me as emotionally perceptive.
The thing I wonder every time this topic comes up is: why is this the question raised to our attention? Why aren't we instead asking whether AlphaFold is conscious? Or DALL-E? I'd feel a lot less wary of confirmation bias here if people were as likely to believe that a GPT that output the raw token numbers was conscious as they are to believe it when those tokens are translated to text in their native language.
Also, I think it is worth separating the question of "can LLMs introspect" (have access to their internal state) vs "are LLM's conscious".
I'm curious how you'd see moral and long-term considerations playing into this. For instance:
1. Saving for retirement produces no experienced benefit for many years and will only ever complete a single investment cycle in a lifetime.
2. Donating to or working on global health, x-risk, etc. produces no experienced benefit ever in most cases.
Yet, in both cases, individuals seem capable of exercising willpower to do these activities for many years.
I can think of 3 models currently that could explain this:
1. They just are "dead willpower", but your willpower system gets enough income from shorter term investments to allow it to continue to invest in things that will not pay out any time soon.
2. Your willpower system has "stock" that gives it value based on the prediction of experienced benefit that hasn't been experienced yet.
3. You experience satisfaction from seeing the 401k numbers go up or feeling like a moral person and that is the payout your willpower gets.
How do you see moral and long-term considerations interacting with your toy model?
I do think that this is probably part of my misprediction - that I simply idealize others too much and don't give enough credit to how inconsistent humans actually are. "Idealize" is probably just the Good version of "flatten", with "demonize" being the Bad version, both of which are probably because it takes less neurons to model someone else that way.
I actually just recently had the displeasure of stumbling upon that reddit and it made me sad that people wanted to devote their energies to just being unkind without a goal. So I'm probably also not modeling how my own principle of avoiding offense unless helpful would erode over time. I've seen it happen to many public figures on twitter - it seems to be part of the system.
I like this perspective. I would agree that there is more to knowing and being known by others than simply Aumann Agreement on empirical fact. I also probably have a tendency to expect more explicit goal-seeking from others than myself.
I haven't thought this through before, but I notice two things that affect how open I am. The first is how much the communication is private, has non-verbal cues, and has an existing relationship. So right now, I'm not writing this with a desired consequence in mind, but I am filtering some things out subconsciously - like if we were in person talking right now, I might launch into a random anecdote, but while writing online I stay on a narrower path.
The second is that I generally only start running my "consequentialist program" once I anticipate that someone may be upset by what I say. The anticipation of offense is what triggers me to think either "but it still needs to be said" or "saying this won't help". So maybe my implicit question was less "why does Eliezer not aim all his communication at his goals" and more "why doesn't he seem to have the same guardrail I do about only causing offense if it will help", which is a more subjective standard.
I accept your correction that I misquoted you. I paraphrased from memory and did miss real nuance. My bad.
Looking at the comment now, I do see that it has a score of -43 currently, and is the only negative karma comment on the post. So maybe a more interesting question is why I (and presumably several others) interpreted it as insult when logical content of "Intelligence(having <30y timeline in 2025) > Intelligence(potted plant)" doesn't contain any direct insult. My best guess is that people are running informal inference on "do they think of me as lower status", and any comparison to a lower intelligence entity is likely to trigger that. For instance, I actually find the thing you just said suggesting that I could have an LLM explain an LSAT-style question to me, to be insulting because it implies that you assign decent probability to my intelligence being lower than LLM or LSAT level. (Of course, I rank it less than "calling someone out publicly, even politely", so I still feel vague social debt to you in this interaction.) I also anticipate that you might respond that you are justified in that assumption given that I seem to not have understood something an LLM could, and that that would only serve to increase the perceived status threat.
The "polite about the house burning" is something I have changed my mind about recently. I initially judged some of your stronger rhetoric as unhelpful because it didn't help me personally, but have seen enough people say otherwise that I now lean toward that being the right call. The remaining confusion I have is over the instances where you take extra time to either raise your own status or lower someone else's instead of keeping discussion focused on the object level. Maybe that's simply because, like me, you sometimes just react to things. Maybe, as someone else suggested, its some sort of punishment strategy. If it is actually intentionally aimed at some goal, I'd be curious to know.
I'm sorry to hear about your health/fatigue. That's a very unfortunate turn of events, for everyone really. I think your overall contribution is quite positive, so I would certainly vote that you keep talking rather than stop! If I got a vote on the matter, I'd also vote that you leave status out of conversations and play to your strength of explaining complicated concepts in a way that is very intuitive for others. In fact, as much as I had high hopes for your research prospects, I never directly experienced any of that - the thing that has directly impressed me, (and if I'm honest, the only reason I assume you'd also be great at research) has been the way you make new insights accessible through your public writing. So, consider this my vote for more of that.
I suspect that some of my dissonance does result from an illusion of consistency and a failure to appreciate how multi-faceted people can really be. I naturally think of people as agents and not as a collection of different cognitive circuits. I'm not ready to assume that this explains all of the gap between my expectations and reality, but it's probably part of it.
I think this is an important perspective, especially for understanding Eliezer, who places a high value on truth/honesty, often directly over consequentialist concerns.
While this explains true but unpleasant statements like "[Individual] has substantially decreased humanity's odds of survival", it doesn't seem to explain statements like the potted plant one or other obviously-not-literally-true statements, unless one takes the position that full honesty also requires saying all the false and irrational things that pass through one's head as well. (And even then, I'd expect to see an immediate follow-up of "that's not true of course").
I agree with this decision. You reference the comment in one of your answers. If it starts taking over, it should be removed, but can otherwise provide interesting meta-commentary.
I just tried criticizing my ingroup. Did my blood boil? No. My Scotsmen got truer. Every time I could identify a flawed behavior, it felt inappropriate to include those people in my "real ingroup". Now, if I had a more objectively defined group based on voting record or religious belief or something, then maybe I'd be able to force my brain to keep them in my ingroup, but right now, my brain flips to "sure, I'm happy to criticize those people giving us a bad name. Look, I'm criticizing my ingroup!"
I tried 2 other experiments:
1. Think about criticisms toward my ingroup that do make me angry - maybe those are the ones hitting home.
Result: I found myself disagreeing with all of them. And my brain asked "what, am I supposed to like wrongheaded arguments just because they are against my group?"
2. Just go straight for the inner-est group I have: me.
Result: I was able to think of criticisms of myself and it didn't make my blood boil, and it wouldn't to write them. I suspect that when I shrink the group to {me}, I may expect extra social points for criticizing myself, making it much more palatable.
So, my quick experiment suggests that, at least for someone without a clearly defined in-group, trying to criticize one's ingroup can be more 'slippery' difficult than 'grueling' difficult.