I am a PhD student in computer science at the University of Waterloo, supervised by Professor Ming Li and advised by Professor Marcus Hutter.
My current research is related to applications of algorithmic probability to sequential decision theory (universal artificial intelligence). Recently I have been trying to start a dialogue between the computational cognitive science and UAI communities. Sometimes I build robots, professionally or otherwise. Another hobby (and a personal favorite of my posts here) is the Sherlockian abduction master list, which is a crowdsourced project seeking to make "Sherlock Holmes" style inference feasible by compiling observational cues. Give it a read and see if you can contribute!
See my personal website colewyeth.com for an overview of my interests and work.
I do ~two types of writing, academic publications and (lesswrong) posts. With the former I try to be careful enough that I can stand by ~all (strong/central) claims in 10 years, usually by presenting a combination of theorems with rigorous proofs and only more conservative intuitive speculation. With the later, I try to learn enough by writing that I have changed my mind by the time I'm finished - and though I usually include an "epistemic status" to suggest my (final) degree of confidence before posting, the ensuing discussion often changes my mind again. As of mid-2025, I think that the chances of AGI in the next few years are high enough (though still <50%) that it’s best to focus on disseminating safety relevant research as rapidly as possible, so I’m focusing less on long-term goals like academic success and the associated incentives. That means most of my work will appear online in an unpolished form long before it is published.
What kind of knowledge specifically are these lawyers looking for?
I don’t necessarily disagree that these guesses are plausible, but I don’t think it’s possible to predict exactly what emulation world ends up looking like, and even your high level description of the dynamics looks very likely to be wrong.
The goal is to become one of the early emulations and shape the culture, regulations, technology etc. into a positive and stable form - or at least, into carefully chosen initial conditions.
There’s a difference between building a model of a person and using that model as a core element of your decision making algorithm. So what you’re describing seems even weaker than weak necessity.
However, I agree that some of the ideas I’ve sketched are pretty loose. I’m trying to provide a conceptual frame and work out some of the implications only.
Skill issue, past me endorses current me.
Yes, I think what you’re describing is basically CIRL? This can potentially achieve incremental uploading. I just see it as technically more challenging than pure imitation learning. However, it seems conceivable that something like CIRL is needed during some kind of “takeoff” phase, when the (imitation learned) agent tries to actively learn how it should generalize by interacting with the original over longer time scales and while operating in the world. That seems pretty hard to get right.
This is interesting, but I again caution that fine tuning a foundation model is unlikely to result in an emulation which generalizes properly. Same (but worse) for prompting.
Maybe, but we usually endorse the way that our values change over time, so this isn’t necessarily a bad thing.
Also, I find it hard to imagine hating my past self so much that I would want to kill him or allow him to be killed. I feel a certain protectiveness and affection for my self 10 or 15 years ago. So I feel like at least weak upload sufficiency should hold, do you disagree?
The first concern seems like a much smaller risk that the one we currently face from unaligned AI. To be clear, I’m suggesting emulations of a relatively large number of people (more than 10, at least once the technology has been well tested, and eventually perhaps everyone). If some of them turn out be evil sociopaths, the others will just have to band together and enforce norms, exactly like we do now.
The second concern sounds like gradual disempowerment to me. However, I think there are a lot of ways for Corporation A to win. Perhaps Corporation B is regulated out of existence - reckless modifications should violate some sort of human alignment code. Perharps we learn how to recursively self improve as emulations, in such a way that the alignment tax is near 0, and then just ensure that initial conditions modestly favor Corporation A (most companies adopt reasonable standards, and over time control most of the resources). Or perhaps corporate power is drastically reduced and emulations are able to coordinate once their intelligence is sufficiently boosted. Or perhaps a small team of early emulations performs a pivotal act. Basically, I think this is something our emulations can figure out.
Well, that’s a concise enough explanation of why I don’t drink.
I expect this to start not happening right away.
So at least we’ll see who’s right soon.