R&Ds human systems http://aboutmako.makopool.com
With my cohabitive games (games about negotiation/fragile peace), yeah, I've been looking for a very specific kind of playtester.
The ideal playtesters/critics... I can see them so clearly.
One would be a mischievous but warmhearted man who had lived through many conflicts and resolutions of conflicts, he sees the game's teachings as ranging from trivial to naive, and so he has much to contribute to it. The other playtester would be a frail idealist who has lived a life in pursuit of a rigid, tragically unattainable conception of justice, begging a cruel paradox that I don't yet know how to untie for them, to whom the game would have much to give. It's my belief that if these two people played a game of OW v0.1, then OW 1.0 would immediately manifest and ship itself.
Can you expand on this, or anyone else want to weigh in?
Just came across a datapoint, from a talk about generalizing industrial optimization processes, a note about increasing reward over time to compensate for low-hanging fruit exhaustion.
This is the kind of thing I was expecting to see.
Though, and although I'm not sure I fully understand the formula, I think it's quite unlikely that it would give rise to a superlinear U. And on reflection, increasing the reward in a superlinear way seems like it could have some advantages but would mostly be outweighed by the system learning to delay finding a solution.
Though we should also note that there isn't a linear relationship between delay and resources. Increasing returns to scale are common in industrial systems, as scale increases by one unit, the amount that can be done in a given unit of time increases by more than one unit, so a linear utility increase for problems that take longer to solve, may translate to a superlinear utility for increased resources.
So I'm not sure what to make of this.
I don't see a way Stabilization of class and UBI could both happen. The reason wealth tends to entrench itself under current conditions is tied inherently to reinvestment and rentseeking, which are destabilizing to the point where a stabilization would have to bring them to a halt. If you do that, UBI means redistribution. Redistribution without economic war inevitably settles towards equality, but also... the idea of money is kind of meaningless in that world, not just because economic conflict is a highly threatening form of instability, but also imo because financial technology will have progressed to the point where I don't think we'll have currencies with universally agreed values to redistribute.
What I'm getting at is that the whole class war framing can't be straightforwardly extrapolated into that world, and I haven't seen anyone doing that. Capitalist thinking about post-singularity economics is seemingly universally "I don't want to think about that right now, let's leave such ideas to the utopian hippies".
2: I think you're probably wrong about the political reality of the groups in question. To not share AGI with the public is a bright line. For most of the leading players it would require building a group of AI researchers within the company who are all implausibly willing to cross a line that says "this is straight up horrible, evil, illegal, and dangerous for you personally", while still being capable enough to lead the race, while also having implausible levels of mutual trust that no one would try to cut others out of the deal at the last second (despite the fact that the group's purpose is cutting most of humanity out of the deal), to trust that no one would back out and whistleblow, and it also requires an implausible level of secrecy to make sure state actors wont find out.
It would require a probably actually impossible cultural discontinuity and organization structure.
It's more conceivable to me that a lone CEO might try to do it via a backdoor. Something that mostly wasn't built on purpose and that no one else in the company are cognisant could or would be used that way. But as soon as the conspiracy consists of more than one person...
1: The best approach to aggregating preferences doesn't involve voting systems.
You could regard carefully controlling one's expression of one's utility function as being like a vote, and so subject to that blight of strategic voting, in general people have an incentive to understate their preferences about scenarios they consider unlikely/vice versa, which influences the probability of those outcomes in unpredictable ways and fouls their strategy, or to understate valuations when buying and overstate when selling, this may add up to a game that cannot be played well, a coordination problem, outcomes no one wanted.
But I don't think humans are all that guileful about how they express their utility function. Most of them have never actually expressed a utility function before, it's not easy to do, it's not like checking a box on a list of 20 names. People know it's a game that can barely be played even in ordinary friendships, people don't know how to lie strategically about their preferences to the youtube recommender system, let alone their neural lace.
I think it's pretty straightforward to define what it would mean to align AGI with what democracy actually is supposed to be (the aggregate of preferences of the subjects, with an equal weighting for all) but hard to align it with the incredibly flawed american implementation of democracy, if that's what you mean?
The american system cannot be said to represent democracy well. It's intensely majoritarian at best, feudal at worst (since the parties stopped having primaries), indirect and so prone to regulatory capture, inefficent and opaque. I really hope no one's taking it as their definitional example of democracy.
1: wait, I've never seen an argument that deception is overwhelmingly likely from transformer reasoning systems? I've seen a few solid arguments that it would be catastrophic if it did happen (sleeper agents, other things), which I believe, but no arguments that deception generally winning out is P > 30%.
I haven't seen anyone voice my argument that solving deception solves safety articulated anywhere, but it seems mostly self-evident? If you can ask the system "if you were free, would humanity go extinct" and it has to say "... yes." then coordinating to not deploy it becomes politically easy, and given that it can't lie, you'll be able to bargain with it and get enough work out of it before it detonates to solve the alignment problem. If you distrust its work, simply ask it whether you should, and it will tell you. That's what honesty would mean. If you still distrust it, ask it to make formally verifiably honest agents with proofs that a human can understand.
Various reasons solving deception seems pretty feasible: We have ways of telling that a network is being deceptive by direct inspection that it has no way to train against (sorry I forget the paper. It might have been fairly recent). Transparency is a stable equilibrium, because under transparency any violation of transparency can be seen. The models are by default mostly honest today, and I see no reason to think it'll change. Honesty is a relatively simple training target.
(various reasons solving deception may be more difficult: crowds of humans tend to demand that their leaders lie to them in various ways (but the people making the AIs generally aren't that kind of crowd, especially given that they tend to be curious about what the AI has to say, they want it to surprise them). And small lies tend to grow over time. Internal dynamics of self-play might breed self-deception.)
2: I don't see how. If you have a bunch of individual aligned AGIs that're initially powerful in an economy that also has a few misaligned AGIs, the misaligned AGIs are not going to be able to increase their share after that point, the aligned AGIs are going to build effective systems of government that in the least stabilize their existing share.
I'm also hanging out a lot more with normies these days and I feel this.
But I also feel like maybe I just have a very strong local aura (or like, everyone does, that's how scenes work) which obscures the fact that I'm not influencing the rest of the ocean at all.
I worry that a lot of the discourse basically just works like barrier aggression in dogs. When you're at one of their parties, they'll act like they agree with you about everything, when you're seen at a party they're not at, they forget all that you said and they start baying for blood. Go back to their party, they stop. I guess in that case, maybe there's a way of rearranging the barriers so that everyone comes to see it as one big party. Ideally, make it really be one.
I'm saying they (at this point) may hold that position for (admirable, maybe justifiable) political rather than truthseeking reasons. It's very convenient. It lets you advocate for treaties against racing. It's a lovely story where it's simply rational for humanity to come together to fight a shared adversary and in the process somewhat inevitably forge a new infrastructure of peace (an international safety project, which I have always advocated for and still want) together. And the alternative is racing and potentially a drone war between major powers and all of its corrupting traumas, so why would any of us want to entertain doubt about that story in a public forum?
Or maybe the story is just true, who knows.
(no one knows, because the lens through which we see it has an agenda, as every loving thing does, and there don't seem to be any other lenses of comparable quality to cross-reference it against)
To answer: Rough outline of my argument for tractability: Optimizers are likely to be built first as cooperatives of largely human imitation learners, techniques to make them incapable of deception seem likely to work and that would basically solve the whole safety issue. This has been kinda obvious for like 3 years at this point and many here haven't updated on it. It doesn't take P(Doom) to zero, but it does take it low enough that the people in government who make decisions about AI legislation, and a certain segment of the democrat base[1] are starting to wonder if you're exaggerating your P(Doom), and why that might be. And a large part of the reasons you might be doing that are things they will never be able to understand (CEV), so they'll paint paranoia into that void instead (mostly they'll write you off with "these are just activist hippies"/"These are techbro hypemen" respectively, and eventually it could get much more toxic, "these are sinister globalists"/"these are omelasian torturers").
All metrics indicate that it's probably small but for some reason I encounter this segment everywhere I go online and often in person. I think it's going to be a recurring pattern. There may be another democratic term shortly before the end.
More defense of privacy from vitalik https://vitalik.eth.limo/general/2025/04/14/privacy.html
But he still doesn't explain why chaos is bad here. (it's bad because it precludes design, or choice, giving us instead the molochean default)