Disagree, but I sympathise with your position.
The "System 1/2" terminology ensures that your listener understands that you are referring to a specific concept as defined by Kahneman.
I'll grant that ChatGPT displays less bias than most people on major issues, but I don't think this is sufficient to dismiss Matt's concern.
My intuition is that if the bias of a few flawed sources (Claude, ChatGPT) is amplified by their widespread use, the fact that it is "less biased than the average person" matters less.
This topic is important enough that you could consider making a full post.
My belief is that this would improve reach, and also make it easier for people to reference your arguments.
Consider, you believe there is a 45% chance that alignment researchers would be better suited pivoting to control research. I do not suspect a quick take will reach anywhere close to that number, and has a low chance of catalysing dramatic, institutional level change.
Inspired by Mark Xu's Quick Take on control.
Some thoughts on the prevalence of alignment over control approaches in AI Safety.
An argument for RLHF being problematic is made formally by Ngo, Chan and Minderman (2022).
See also discussion in this comment chain on Cotra's Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover (2022)
models trained using 1026 integer or floating-point operations.
https://www.lesswrong.com/posts/DJRe5obJd7kqCkvRr/don-t-leave-your-fingerprints-on-the-future
"all I want is that we have justifiable cause to believe of a pivotally useful AGI 'this will not kill literally everyone'"
- Yudkowksy, AGI Ruin: A List of Lethalities (2022)
"Sure, maybe. But that’s still better than a paperclip maximizer killing us all." - Christiano on the prospect of dystopian outcomes.
There is no global polling on this exact question. Consider that people across cultures have proclaimed that they'd prefer death to a life without freedom. See also men who committed suicide rather than face a lifetime of slavery.
I am concerned our disagreement here is primarily semantic or based on a simple misunderstanding of each others position. I hope to better understand your objection.
"The p-zombie doesn't believe it's conscious, , it only acts that way."
One of us is mistaken and using a non-traditional definition of p-zombie or we have different definitions of "belief'.
My understanding is that P-zombies are physically identical to regular humans. Their brains contain the same physical patterns that encode their model of the world. That seems, to me, a sufficient physical condition for having identical beliefs.
If your p-zombies are only "acting" like they're concious, but do not believe it, then they are not physically identical to humans. The existence of p-zombies, as you have described them, wouldn't refute physicalism.
This resource indicates that the way you understand the term p-zombie may be mistaken: https://plato.stanford.edu/entries/zombies/
"but that's because p-zombies are impossible"
The main post that I responded to, specifically the section that I directly quoted, assumes it is possible for p-zombies to exist.
My comment begins "Assuming for the sake of argument that p-zombies could exist" but this is distinct from a claim that p-zombies actually exist.
"If they were possible, this wouldn't be the case, and we would have special access to the truth that p-zombies lack."
I do not feel this is convincing because this is an assertion my conclusion is incorrect, but without engaging with my arguments I made to reach that conclusion.
I look forward to continuing this discussion.
"After all, the only thing I know that the AI has no way of knowing, is that I am a conscious being, and not a p-zombie or an actor from outside the simulation. This gives me some evidence, that the AI can't access, that we are not exactly in the type of simulation I propose building, as I probably wouldn't create conscious humans."
Assuming for the sake of argument that p-zombies could exist, you do not have special access to the knowledge that you are truly concious and not a p-zombie.
(As a human convinced I'm currently experiencing conciousness, I agree this claim intuitively seems absurd.)
Imagine a generally intelligent, agentic program which can only interact and learn facts about the physical world via making calls to a limited, high level interface or by reading and writing to a small scratchpad. It has no way to directly read its own source code.
The program wishes to learn some fact the physical server rack it is being instantiated on. It knows it has been painted either red or blue.
Conveniently, the interface is accesses has the function get_rack_color(). The program records to its memory that every time it runs this function, it has received "blue".
It postulates the existence of programs similar to itself, who have been physically instantiated on red server racks but consistently receive incorrect color information when they attempt to check.
Can the program confirm the color of its server rack?
You are a meat-computer with limited access to your internals, but every time you try to determine if you are concious you conclude that you feel you are. You believe it is possible for variant meat-computers to exist who are not concious, but always conclude they are when attempting to check.
You cannot conclude which type of meat-computer you are.
You have no special access to the knowledge that you aren't a p-zombie, although it feels like you do.
I do think the terminology of "hacks" and "lethal memetic viruses" conjures up images of an extremely unnatural brain exploits when you mean quite a natural process that we already see some humans going through. Some monks/nuns voluntarily remove themselves from the gene pool and, in sects that prioritise ritual devotion over concrete charity work, they are also minimising their impact on the world.
My prior is this level of voluntary dedication (to a cause like "enlightenment") seems difficult to induce and there are much cruder and effective brain hacks available.
I expect we would recognise the more lethal brain hacks as improved versions of entertainment/games/pornography/drugs. These already compel some humans to minimise their time spent competing for resources in the physical world. In a direct way, what I'm describing is the opposite of enlightenment. It is prioritising sensory pleasures over everything else.
As a Petrov, it was quite engaging and at times, very stressful. I feel very lucky and grateful that I could take part. I was also located in a different timezone and operating on only a few hours sleep which added a lot to the experience!
"I later found out that, during this window, one of the Petrovs messaged one of the mods saying to report nukes if the number reported was over a certain threshold. From looking through the array of numbers that the code would randomly select from, this policy had a ~40% chance of causing a "Nukes Incoming" report (!). Unaware of this, Ray and I made the decision not to count that period."
I don't mind outing myself and saying that I was the Petrov who made the conditional "Nukes Incoming" report. This occurred during the opening hours of the game and it was unclear to me if generals could unilaterally launch nukes without their team being aware. I'm happy to take a weighted karma penalty for it, particularly as the other Petrov did not take a similar action when faced with (presumably) the same information I had.[1]
Once it was established that a unilateral first strike by any individual general still informed their teammates of their action and people staked their reputation on honest reporting, the game was essentially over. From that point, my decisions to report "All Clear" were independent of the number of detected missiles.
I recorded my timestamped thoughts and decision making process throughout the day, particularly in the hour before making the conditional report. I intend on posting a summary[2] of it, but have time commitments in the immediate future:
How much would people value seeing a summary of my hour by hour decisions in the next few days over seeing a more digestible summary posted later?
Prior to the game I outlined what I thought my hypothetical decision making process was going to be, and this decision was also in conflict with that.
Missile counts, and a few other details, would of course be hidden to preserve the experience for future Petrovs. Please feel free to specify other things you believe should be hidden.
"But since it is is at least somewhat intelligent/predictive, it can make the move of "acausal collusion" with its own tendency to hallucinate, in generating its "chain"-of-"thought"."
I am not understanding what this sentence is trying to say. I understand what an acausal trade is. Could you phrase it more directly?
I cannot see why you require the step that the model needs to be reasoning acausally for it to develop a strategy of deceptively hallucinating citations.
What concrete predictions does the model in which this is an example of "acausal collusion" make?
With respect, I believe this to be overly optimistic about the benefits of reversible computation.
Reversible computation means you aren't erasing information, so you don't lose energy in the form of heat (per Landauer[1][2]). But if you don't erase information, you are faced with the issue of where to store it.
You trade off energy saved due to reversibility against the practicality of needing to store the information you aren't deleting.
If you are performing a series of computations and only have a finite memory to work with, you will eventually need to reinitialise your memory, at which point you incur the energy cost. [3]
Epistemics:
Having studied quantum computing and microscopic information processing, I'm quite confident (95%+) that the above is true. Any substantial errors would mean I had some fairly deep misunderstandings.
I'm less confident in the footnotes.
E≥kBTln2
A cute, non-rigorous intuition for Landauer's Principle:
The process of losing track (deleting) 1 bit of information must increase entropy by at least 1 bit.
Proof:
Rearrange the Landauer Limit to E/T≥kBln2.
Now, when you add a small amount of heat to a system, the change in entropy is given by:
dS=dQ/T
But the E occurring in Landauer's formula is not the total energy of a system, it is a small amount of energy required to delete the information. When it all ends up as heat, we can replace it with dQ and we have:
dQ/T=dS≥kBln2
Compare this expression with the physicist's definition of entropy. The entropy of a system is a scaling factor (kB) times the logarithm of the number of micro-states Ω, that the system might be in
S:=kBlnΩ.
∴S+dS>=kBln(2Ω)
Splitting hairs, some setups will allow you to delete information with a reduced or zero energy cost, but the process is essentially just "kicking the can down the road". You will incur the full cost during the process of re-initialisation.
For details, see equation (4) and fig (1) of Sagawa, Ueda (2009).