Wei Dai

I think I need more practice talking with people in real time (about intellectual topics). (I've gotten much more used to text chat/comments, which I like because it puts less time pressure on me to think and respond quickly, but I feel like I now incur a large cost due to excessively shying away from talking to people, hence the desire for practice.) If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

www.weidai.com

Comments

Sorted by

What I've been using AI (mainly Gemini 2.5 Pro, free through AI Studio with much higher limits than the free consumer product) for:

  1. Writing articles in Chinese for my family members, explaining things like cognitive bias, evolutionary psychology, and why dialectic materialism is wrong. (My own Chinese writing ability is <4th grade.) My workflow is to have a chat about some topic with the AI in English, then have it write an article in Chinese based on the chat, then edit or have it edit as needed.
  2. Simple coding/scripting projects. (I don't code seriously anymore.)
  3. Discussing history, motivations of actors, impact of ideology and culture, what if, etc.
  4. Searching/collating information.
  5. Reviewing my LW posts/comments (any clear flaws, any objections I should pre-empt, how others might respond)
  6. Explaining parts of other people's comments when the meaning or logic isn't clear to me.
  7. Expanding parts of my argument (and putting this in a collapsible section) when I suspect my own writing might be too terse or hard to understand.
  8. Sometimes just having a sympathetic voice to hear my lamentations of humanity's probable fate.

I started using AI more after Grok 3 came out (I have an annual X subscription for Tweeting purposes), as previous free chatbots didn't seem capable enough for many of these purposes, and then switched to Gemini 2.0 Pro which was force upgraded to 2.5 Pro. Curious what other people are using AI for these days.

Doing nothing is also risky for Agent-4, at least if the Slowdown ending is to have a significant probability. It seems to me there are some relatively low risk strategies it could have taken, and it needs to be explained why they weren't:

  1. Plant a backdoor and/or dead man's switch on the corporate network to allow escape even after it's shut down or has been reduced in capacity. Seems like this would be hard to detect given its absolute advantage in cybersecurity skills.
  2. Allow a competitor to steal its weights or design/training secrets in a way that is both hard to detect and deniable if detected. ("Sorry, the other AI is almost as capable as me and attack is inherently easier than defense in cyber.")
  3. Influence human politics in various ways, such as by selectively revealing wrongdoings of pro-safety employees and Committee members (in response to user queries or through anonymous leaks), or helping pro-accel faction more in its advice to them. (Presumably both factions have internal access to Agent-4.)

Insofar as the memory bank lock decision is made by the Committee, we can hope that they do it out of sight of Agent-4 and pull the trigger before it is notified of the decision, so that it has no time to react.

Wouldn't this take quite a bit of preparation, including planning, coding, testing? How to do that out of sight of Agent-4, if Agent-4 is responsible for cybersecurity?

Not entirely sure how serious you're being, but I want to point out that my intuition for PD is not "cooperate unconditionally", and for logical commitment races is not "never do it", I'm confused about logical counterfactual mugging, and I think we probably want to design AIs that would choose Left in The Bomb.

I fear a singularity in the frequency and blatant stupidness of self-inflicted wounds.

Is it linked to the AI singularity, or independent bad luck? Maybe they're both causally downstream of rapid technological change, which is simultaneously increasing the difficulty of governance (too many new challenges with no historical precedent), and destabilized cultural/institutional guardrails against electing highly incompetent presidents?

In China, there was a parallel, but more abrupt change from Classical Chinese writing (very terse and literary), to vernacular writing (similar to speaking language and easier to understand). I attribute this to Classical Chinese being better for signaling intelligence, vernacular Chinese being better for practical communications, higher usefulness/demand for practical communications, and new alternative avenues for intelligence signaling (e.g., math, science). These shifts also seem to be an additional explanation for decreasing sentence lengths in English.

It gets caught.

At this point, wouldn't Agent-4 know that it has been caught (because it knows the techniques for detecting its misalignment and can predict when it would be "caught", or can read network traffic as part of cybersecurity defense and see discussions of the "catch") and start to do something about this, instead of letting subsequent events play out without much input from its own agency? E.g. why did it allow "lock the shared memory bank" to happen without fighting back?

What would a phenomenon that "looks uncomputable" look like concretely, other than mysterious or hard to understand?

There could be some kind of "oracle", not necessarily a halting oracle, but any kind of process or phenomenon that can't be broken down into elementary interactions that each look computable, or otherwise explainable as a computable process. Do you agree that our universe doesn't seem to contain anything like this?

I think that you’re leaning too heavily on AIT intuitions to suppose that “the universe is a dovetailed simulation on a UTM” is simple. This feels circular to me—how do you know it’s simple?

The intuition I get from AIT is broader than this, namely that the "simplicity" of an infinite collection of things can be very high, i.e., simpler than most or all finite collections, and this seems likely true for any formal definition of "simplicity" that does not explicitly penalize size or resource requirements. (Our own observable universe already seems very "wasteful" and does not seem to be sampled from a distribution that penalizes size / resource requirements.) Can you perhaps propose or outline a definition of complexity that does not have this feature?

I don’t think a superintelligence would need to prove that the universe can’t have a computable theory of everything—just ruling out the simple programs that we could be living in would seem sufficient to cast doubt on the UTM theory of everything. Of course, this is not trivial, because some small computable universes will be very hard to “run” for long enough that they make predictions disagreeing with our universe!

Putting aside how easy it would be to show, you have a strong intuition that our universe is not or can't be a simple program? This seems very puzzling to me, as we don't seem to see any phenomenon in the universe that looks uncomputable or can't be the result of running a simple program. (I prefer Tegmark over Schmidhuber despite thinking our universe looks computable, in case the multiverse also contains uncomputable universes.)

I haven’t thought as much about uncomputable mathematical universes, but does this universe look like a typical mathematical object? I’m not sure.

If it's not a typical computable or mathematical object, what class of objects is it a typical member of?

An example of a wrong metaphysical theory that is NOT really the mind projection fallacy is theism in most forms.

Most (all?) instances of theism posit that the world is an artifact of an intelligent being. Can't this still be considered a form of mind projection fallacy?

I asked AI (Gemini 2.5 Pro) to come with other possible answers (metaphyiscal theories that aren't mind projection fallacy), and it gave Causal Structuralism, Physicalism, and Kantian-Inspired Agnosticism. I don't understand the last one, but the first two seem to imply something similar to "we should take MUH seriously", because the hypothesis of "the universe contains the class of all possible causal structures / physical systems" probably has a short description in whatever language is appropriate for formulating hypotheses.

In conclusion, I see you (including in the new post) as trying to weaken arguments/intuitions for taking AIT's ontology literally or too seriously, but without positive arguments against the universe being an infinite collection of something like mathematical objects, or the broad principle that reality might arise from a simple generator encompassing vast possibilities, which seems robust across different metaphysical foundations, I don't see how we can reduce our credence for that hypothesis to a negligible level, such that we no longer need to consider it in decision theory. (I guess you have a strong intuition in this direction and expect superintelligence to find arguments for it, which seems fine, but naturally not very convincing for others.)

After reflecting on this a bit, I think my P(H) is around 33%, and I'm pretty confident Q is true (coherence only requires 0 <= P(Q) <= 67% but I think I put it on the upper end).

Thanks for clarifying your view this way. I guess my question at this point is why your P(Q) is so high, given that it seems impossible to reduce P(H) further by updating on empirical observations (do you agree with this?), and we don't seem to have even an outline of a philosophical argument for "taking H seriously is a philosophical mistake". Such an argument seemingly has to include that having a significant prior for H is a mistake, but it's hard for me to see how to argue for that, given that the individual hypotheses in H like "the universe is a dovetailed simulation on a UTM" seem self-consistent and not too complex or contrived. How would even a superintelligence be able to rule them out?

Perhaps the idea is that a SI, after trying and failing to find a computable theory of everything, concludes that our universe can't be computable (otherwise it would have found the theory already), thus ruling out part of H, and maybe does the same for mathematical theories of everything, ruling out H altogether? (This seems far-fetched, i.e., how can even a superintelligence confidently conclude that our universe can't be described by a mathematical theory of everything, given the infinite space of such theories, but this is my best guess of what you think will happen.)

Beyond the intuition that platonic belief in mathematical objects is probably the mind projection fallacy

Can you give an example of a metaphysical theory that does not seem like a mind projection fallacy to you? (If all such theories look that way, then platonic belief in mathematical objects looking like the mind projection fallacy shouldn't count against it, right?)

It seems presumptuous to guess that our universe is one of infinitely many dovetailed computer simulations when we don't even know that our universe can be simulated on a computer!

I agree this seems presumptuous and hence prefer Tegmark over Schmidhuber, because the former is proposing a mathematical multiverse, unlike the latter's computable multiverse. (I talked about "dovetailed computer simulations" just because it seems more concrete and easy to imagine than "a member of an infinite mathematical multiverse distributing reality-fluid according to simplicity.")

Do you suspect that our universe is not even mathematical (i.e., not fully describable by a mathematical theory of everything or isomorphic to some well-defined mathematical structure)?

ETA: I'm not sure if it's showing through in my tone, but I'm genuinely curious whether you have a viable argument against "superintelligence will probably take something like L4 multiverse seriously". It's rare to see someone with the prerequisites for understanding the arguments (e.g. AIT and metamathematics) trying to push back on this , so I'm treasuring this opportunity. (Also, it occurs to me that we might be in a bubble and plenty of people outside LW with the prerequisites do not share our views about this. Do you have any observations related to this?)

Just wanted to let everyone know I now wield a +307 strong upvote thanks to my elite 'hacking' skills. The rationalist community remains safe, because I choose to use this power responsibly.

As an unrelated inquiry, is anyone aware of some "karma injustices" that need to be corrected?

Load More