I'm particularly interested in sustainable collaboration and the long-term future of value. I'd love to contribute to a safer and more prosperous future with AI! Always interested in discussions about axiology, x-risks, s-risks.
I enjoy meeting new perspectives and growing my understanding of the world and the people in it. I also love to read - let me know your suggestions! In no particular order, here are some I've enjoyed recently
Cooperative gaming is a relatively recent but fruitful interest for me. Here are some of my favourites
People who've got to know me only recently are sometimes surprised to learn that I'm a pretty handy trumpeter and hornist.
I'm interested to know how (if at all) you'd say the perspective you've just given deviates from something like this:
My current guess is you agree with some reasonable interpretation of all these points. And maybe also have some more nuance you think is important?
Given the picture I've suggested, the relevant questions are
A complementary angle: we shouldn't be arguing over whether or not we're in for a rough ride, we should be figuring out how to not have that.
I suspect more people would be willing to (both empirically and theoretically) get behind 'ruthless consequentialist maximisers are one extreme of a spectrum which gets increasingly scary and dangerous; it would be bad if those got unleashed'.
Sure, skeptics can still argue that this just won't happen even if we sit back and relax. But I think then it's clearer that they're probably making a mistake (since origin stories for ruthless consequentialist maximisers are many and disjunctive). So the debate becomes 'which sources of supercompetent ruthless consequentialist maximisers are most likely and what options exist to curtail that?'.
"This short story perfectly depicts the motivations and psychological makeup of my milieu," I think wryly as I strong upvote. I'm going to need to discuss this at length with my therapist. Probably the author is one of those salty mid-performing engineers who didn't get the offer they wanted from Anthropic or whatever. That thought cheers me up a little.
Esther catches sight of the content on my screen over my shoulder. "I saw that too," she remarks, looking faintly worried in a way which reminds me of why I am hopelessly in love with what she represents. "Are we, like, the bad guys, or maybe deluding ourselves that we're the good guys in a bad situation? It seems like that author thinks so. It does seem like biding my time hasn't really got me any real influence yet."
I rack my brain for something virtuous to say. "Yeah, um, safety-washing is a real drag, right?" Her worry intensifies, so I know I'm pronouncing the right shibboleths. God, I am really spiritually emaciated right now. I need to cheer her up. "But think about it, we really are in the room, right? Who else in the world can say that? It's not like Vox or Krishna are going to wake up any time soon. That's a lot of counterfactual expected impact."
She relaxes. "You're right. Just need to keep vigilant for important opportunities to speak up. Thanks." We both get back to tuning RL environments and meta-ML pipelines.
Yeah, clones is probably better conceptually, if a mouthful
Shared utility functions and merging.
Kind of an aside, but I think it's underappreciated (including in economics and game theory) how much humans actually can do this sort of 'exotic' thing. We totally merge utility functions a bit all the time! With friends and family and colleagues and acquaintances. Crudely modelling, we have something like an affection/altruism coefficient for people we recognise (and even for members of abstract groups/coalitions we recognise or conceptualise). And besides this innate thing, we formally and normatively erect firms, institutions etc which embody heuristically merged preference mappings and so on etc.
I'm not aware of useful theory which relates to this.
We'll go from individual minds to super-minds, just-out minds.
So, I don't think this really a qualitative change.
In sad cases, humans as constituents of super-minds go extinct, and machine minds (and coalitions of minds, and overlapping coalitions of coalitions etc.) continue to exhibit super-minding. In happy cases, humans as willing and participant members of super-minds continue to flourish, in part by virtue of the super-minds' competence and coherence, and in part on account of a sufficient internal balance of liberality and temperance.
An aside: I think sometimes conceptualisation of super-coordination or super-minds etc. is unfortunately quite hierarchical[1], quite feudalistic. I tentatively think modern humans benefit a lot from belonging to overlapping coalitions and communities, unlike the analogy to multi-gene genomes or multi-organelle cells or multi-cellular organisms. And in any case, it looks pretty difficult and harmful to go from where we are today to a more rigidly tree-like structure of social relations, even if humans could live just fine or even flourish in such conditions.
I don't mean this in a sort of pejorative 'power relations/inequality' way, I mean in the 'structured like a tree' way, where there aren't overlaps or cross-links between subcommunities.
I resonate a lot with Beren's perspective. Definitely 'AI polytheism' (and this is a great term!) is a neglected perspective.
And definitely there are some refinements needed to a naive 'values are hyper specific' perspective: unquestionably there are evopsych+game theory selection stories for many of our drives, 'biases', heuristics, etc., as well as for our socially-developed institutions and norms. They are even plausibly (though it's unclear) to varying degrees 'convergent' in some region.
I worry it's a sleight of hand, though, to call these 'human values' in the same sense as is meant by those concerned about erosion/destruction of such human values (whether acutely or gradually).
Importantly, of course I can imagine a para economy+society of machines exhibiting some behavioural analogues of trust, reputation, coalitions, affection, play. But I don't see any good reason to be confident that those coalitions, that affection, the capacity to engage in trust etc. would be inclusive of humans, or even of machines with relevant subjective experience to appreciate it. Corporations are a good example: it's great that they have identity, reputation, capacity to enter into agreements and so on, because it enables coordination. But I absolutely don't care about the corporation for its own sake, and a world of corps only would be a dead one.
I'm sad because I thought this was obvious and well-known. I still think that sensible use of generated code+proofs is a pretty exciting prospect/unlock for the sensible parts of humanity if we use it well.
'Third tier' priority (for 3x) etc?
(I forgot that more conversation might happen on a LW crosspost, and I again lament that the internet has yet to develop a unified routing system for same-content-different-edition discourse. Copied comment from a few days ago on substack:)
I really appreciate this (and other recent) transparency. This is much improved since AI 2027.
One area I get confused by (same with Davidson, with whom I've discussed this a bit) is 'research taste'. When you say things like 'better at research taste', and when I look at your model diagram, it seems you're thinking of taste as a generic competence. But what is taste? It's nothing but a partially-generalising learned heuristic model of experiment value-of-information. (Said another way, it's a heuristic value function for the 'achieve insight' objective of research).
How do you get such learned models? No other way than by experimental throughput and observation thereof (direct or indirect: can include textbooks or notes and discussions with existing experts)!
See my discussion of research and taste
As such, taste accumulates like a stock, on the basis of experimental throughput and sample efficiency (of the individual or the team) at extracting the relevant updates to VOI model. It 'depreciates' as you go, because the frontier of the known moves, which moves gradually outside the generalising region of the taste heuristic (eventually getting back to naive trial and error), most saliently here with data and model scale, but also in other ways.
This makes sample efficiency (of taste accumulation) and experimental throughput extremely important, central in my view. You might think that expert interviews and reading all the textbooks ever etc provide meaningful jumpstart to the taste stock. But they certainly don't help with the flow. So then you need to know how fast it depreciates over the relevant regime.
(Besides pure heuristic improvements, if you think faster, you can also reason your way to somewhat better experiment design, both by naively pumping your taste heuristics for best-of-k, or by combining and iterating on designs. I think this reasoning boost falls off quite sharply, but I'm unsure. See my question on this)