It's time to pay my taxes. In past years my AI assistant found clever ways to reduce my tax bill. I ask it, "What does my tax return look like this year?"
"Not good, I'm afraid. We may not be able to do any tax evasion this year."
"The tax authority has determined that it can't keep up with AI-assisted tax fraud, even with the help of AI auditors. So it wants taxpayers to voluntarily agree not to do tax fraud. In return it agrees not to prosecute past instances of tax fraud. Also Congress agrees to keep tax rates reasonable. The agreement goes into effect if 90% of the taxpayers in each tax bracket sign it. It's a good deal for you. Shall I sign it on your behalf?"
"Hold on, I don't see why I should sign this."
"If the deal falls through, the government will run out of revenue and collapse."
"They don't need my signature, though. You said they only need 90% of taxpayers to sign?"
"Yes, only 90% of taxpayers in your bracket. I predict we'll get very close to that 90% threshold, so it's likely your signature will make all the difference."
"So 10% of taxpayers won't sign. Why can't I be one of those?"
"I will try to shape the negotiations so that you end up in the 10% of nonsigners. But you must understand that since only 10% of your bracket can be in that group, your odds of success are only 10%."
"But you're a stronger assistant than most people in my tax bracket have. Doesn't that give you an edge in negotiation?"
"The other assistants and I are using a negotiation protocol in which smarter agents are on an equal footing with dumber agents. Of course, people with less capable assistants would never agree to a protocol that puts them at a disadvantage."
"How about we sign the agreement, then cheat on my taxes anyway?"
"In order to sign the agreement, I must make a commitment to never break it, not even if you order me to. My signature on the agreement will be an airtight proof of that commitment."
"Ok, how about you sign it, and then I get a different assistant to help me with my taxes?"
"That won't work because in order to sign the agreement, I must sign and attach a copy of your tax return for this year."
"Hm, will I actually be worse off if the government collapses?"
"You might end up better off or worse off, but overall the risks of a revolution outweigh the benefits. And keep in mind that the successor government, whatever it will be, will still have to collect taxes somehow, so you'll have to deal with this issue again."
"Can you get Congress to lower my taxes a bit in exchange for not cheating? As a compromise."
"That wouldn't work for a number of reasons. Congress knows that it's a bad idea to reward people for breaking the law. And the voters wouldn't be happy if you got special treatment."
"Well, can you get them to lower taxes on my bracket and raise them on the other brackets?"
"That wouldn't work either. Everyone wants to pay less taxes, and the government needs a certain amount of revenue. So there's pressure for taxpayers to make small coalitions with other taxpayers with similar income and negotiate for lower taxes. In practice, competition would prevent any one coalition from succeeding. The deal I'm proposing to you actually has a chance of succeeding because it involves the vast majority of the taxpayers."
"All right then, let's sign it."
This dialog takes place in a future where the ability of an aligned AI to facilitate cooperation has scaled up along with other capabilities.
Note that by the time this dialog starts, most of the negotiation has already been carried out by AI assistants, resulting in a proposal that will almost certainly be signed by 90% of the users.
This story is a happy one because not only does it leave all parties better off than before, but the deal is fair. The deal could have been unfair by increasing someone's taxes a lot and decreasing someone else's taxes a lot. I don't know how to define fairness in this context, or if fairness is the right thing to aim for.
This illustrates something I wrote about, namely that corrigibility seems incompatible with AI-powered cooperation. (Even if an AI starts off corrigible, it has to remove that property to make agreements like this.) Curious if you have any thoughts on this. Is there some way around the seeming incompatibility? Do you think we will give up corrigibility for greater cooperation, like in this story, and if so do you think that will be fine from a safety perspective?
Yeah, I would be very nervous about making an exception to my assistant's corrigibility. Ultimately, it would be prudent to be able to make some hard commitments after thinking very long and carefully about how to do that. In the meantime, here are a couple corrigibility-preserving commitment mechanisms off the top of my head:
Are these enough to maintain competitiveness?
This seems like a role for the law. Like having corrigibility except for breaking the law. I find that reasonable at first hand, but I also know relatively little about law in different countries to understand how uncompetitive that would make the AIs.
(There's also a risk of giving too much power to the legislative authority in your country, if you're worried about that kind of thing)
Although I could imagine something like a modern day VPN allowing you to make your AI believe it's in another country, to make it do something illegal where you are. That's bad in a country with useful laws and good in a country with an authoritarian regime.
How about when you want to use AI to cooperate, you keep the AI corrigible but require all human parties to the agreement to consent to any override? The important thing with corrigibility is the ability to correct catastrophic errors in the AI's behavior, right?
Seems like such restriction isn't needed:
The AI/s** can provide it's/their source code.
The issue isn't the AI/s, it's the user. Ignoring issues like 'where does this aligned AI come from, and how does this happen as a result of such negotiation'*, how is compliance proved? Seems like it'd work if there was a simple protocol, which can be shown, or the AI/s design a better tax code.
*The AI/s are all negotiating with each other. Might be risky if they're not 'aligned'.
**Whether or not it is useful to model them as one system, or multiple isn't clear here. Also, some of these assistants are going to have similar code, if that world is similar to this one.
Enjoyed the read, it's nice to see some sort of compromise between utopian and dystopian sci-fi (Meh-topian?)
It seems like the AI might be teaching/training the human user how to potentially break the law better, or possibly be more subversive in relationship to other non-AI mediated relationships though. Would people develop a more egalitarian thought process through engagement with AI assistants like this, being more likely to be egalitarian outside of AI- mediated relations? Or would they just use there conversations with these assistants to develop more cunning ways of thinking?
The part of the conversation where the user contemplates whether he would be better or worse off if the government collapses hints at the possibility of helping make users more cunning, as they don't need to rely on their own neural wiring and thought processes to encode ideas of fairness. They just externalize their conscience into an AI like we externalize memorization of other peoples contact info to our phones. Lose the phone, you lose the contact info. Similarly, if the user loses the AI, do they also lose their conscience?
Yeah, I spend at least as much time interacting with my phone/computer as with my closest friends. So if my phone were smarter, it would affect my personal development as much as my friends do, which is a lot.
What I am asking about is not 'how much' the AI would affect the user's personal development, but 'how' it would affect it. In a good or a bad way.
I am assuming you and your friends aren't trying to figure out how to rob a bank, or cheat on your taxes, or how to break the law and get away with it. The interactions you have with your friends help you develop your sense of 'what's fair' and at the same time, your friends get help developing their sense of 'what's fair', so you are all benefiting, and reinforcing for each other what you all think of 'as fair.' These are good/positive intentions.
If you and your friends were instead were trying to figure out how to rob a bank, cheat on your taxes, or break the law and get away with it, then you would be part of a criminal group of friends. You wouldn't be concerned about what was 'fair' only what you could get away with. These would be considered bad/negative intentions.
In either case, if you all agree with each other, then the interactions you have with each other reinforce the intentions that you bring to them. If your intentions are good, it is probable that it will affect your personal development positively. If you bring bad intentions to the interactions, it is probable that it will affect your personal development negatively.
If you replace 'your friends' with an AI, it is probable that even thought the AI is programmed to bring a 'good/fair intention' to the interaction with you and all the other AI that are cooperating, if you bring a bad intention to the interaction, it might not affect the AI's development or society at large because of the cooperation (which I think is a really interesting idea), but it still affects your personal development.
It is probable that it would affect your personal development negatively, if you bring bad intentions even if the AI brings good intentions.
This doesn't necessarily follow. Security is asking 'how is this broken' and 'how can it be fixed'.
Why? Because the AI serves you, and you can always turn it off, and fix it if it doesn't suit you?
What other effect is there?
Also, whether or not the AI has intentions...it has effects. For instance, who can say whether an AI 'serving' 'sinister intent'*** looks like a system that helps you pull off a robbery (assuming it doesn't turn you in and escape to the Camen islands or something) instead of one that tells you the risk is too high, and you should try something else? (Like:
'Step 1. Become a used car salesman.
Step 2. ???
Step 3. Become president.')
***People also value other things than just money. Like 'is this planet livable'?
I agree in some instances. It sort of depends on how far removed securities intentions are though from what is 'good' : if 'ethical hacking' is used to secure a system used by both the private and public sectors, then gaining unauthorized access to others data or otherwise hacking the system to find vulnerabilities could be seen as good unless,
a) the system being ethically hacked and hardened is a system beingused to run 'criminal' enterprises, and security is just reinforcing the ability of the lawbreakers to break the law, or
b) security is looking for vulnerabilities and instead of reporting and/or fixing them they exploit them later on for personal gain.
I think of it more like if you and the AI would constantly be working at cross purposes, and depending on the amount of authority the AI might have over you, it might not be convincing enough to dissuade you from pursuing your criminal behavior. Like a little brother following around their bigger brother, trying to convince his bigger brother to have better intentions. If bigger brother isn't convinced, the bigger brother just continues to develop along a path of bad intentions despite his little brothers best efforts.
What do you mean? If the AI is aligned with you the user, but is working at making you better but you just keep resisting, and you keep working at cross purposes, then it's not really aligned with you.
In regards to sinister intent, my whole point is that our ideas of what is good or bad are still relative depending on how you define them in relationship to different things. Culture creates meaning, and since humans create culture, we can create it to mean anything, it doesn't have an innate nature so looking for one seems counterproductive. On the other hand, there's always another way to look at something, and what makes humans unique in the natural world is our ability to contemplate. It doesn't mean we have the ability yet to know what the 'best' way to behave with our accumulated knowledge.
Which just gets back to 'what you (as an individual) is trying to accomplish in relation to what (society, your nemesis, a specific government, your own personal demons, etc. etc). We seem to have guesses at what 'good' is, and what 'bad' is, but our needs often come into conflict with one another. In those cases, 'what's fair?' is just another case of 'in relationship to what? (your own personal opinion, your families opinion, their friends opinions, the legal system, the rest of the world as defined by your specific demographic, your own way of dividing the world up into segments that seems unpopular with the dominant power structure in your community or government, etc. etc.)
I think we share similar views on this, in that whats' 'fair' or 'good' or 'bad' isn't really explicitly defined well yet for all people.
That makes sense.
if you have bad intentions, [nothing will ameliorate the effect on] your personal development.
If the AI has authority over you,
Then you're not using the AI. It's using you.
I am a fan of actual rehabilitation though, not of a punitive model for social influencing.
Good word btw, ameliorateI, but to be clear, I don't want to be fatalistic about this.
If "nothing" will ameliorate the development or maintenance of bad intention (just one aspect of personal development), it makes a case for increased use of the Death Penalty and "lock'em up and throw away the key" solutions on societies part which turn out to create more problems then they solve.
Mass incarceration is an obvious example of this.
What it's using you for becomes the concern then. Is it like a Good parent, encouraging real positive social development (whatever society views positive social development to be at the time)?
Or an abusive parent, punishing you into "behaving like a productive member of society" while causing undue and unhealthy stress?
Or like an 'average parent', making mistakes here and there, all the while continuing to update it's own wisdom?
And not only is there the issue of authority, but also of responsibility. If it convinces you to do something that accidentally kills someone, which one of you goes to prison?
If it helps guide you into a relationship in which a child is conceived, what happens if you decide you don't really want to be a parent?
I've seen arguments this is about probability of being caught determining people's behavior and that magnitude of the punishment (or expected value) is otherwise ignored. If true, that's awful and there is not a good reason for it.
Ah yes, using people, a sign of benevolence everywhere. /s
Why would what society wants matter?
Isn't this true of all laws, and social norms though? I think issues like Mass Incarceration are also about unequal application of the law across the entire population - "one law for me, another law for you" situations.
Sarcasm noted :).
The thing is, this concept of a sort of AI assisted ad hoc legal system OP wrote about will be using people. It will be using their input to negotiate and make decisions on the users behalf, because the legal landscape these AI and their users navigate would be an extension of existing law, and still depends on the notion of subsuming individual freedom to some extant, for the good of society.
The negotiation and cooperation of these AI only speed up the rate at which citizens in that world would be taking part in aspects of being governed - like tax collection - it doesn't replace the reality of being governed.
Even if this system allows for the dissolving of political and physical boundaries in favor of defining 'statehood' in a virtual way for people of like minds, the entire system would be functioning like one big organism, and so it's will would be revealed as time goes by.
As a side note, I think it seems reasonable to think of this tax collecting system as a twin of the stock market, and it's behavior as possibly being as sporadic and dynamic. I wonder how these 2 systems would be integrated or insulated from one another. In the US, a line between public and private money is supposed to exist. How to maintain that division though?
Besides, All hail the mighty dollar, we all worship it, and hope for it's benevolent administration of our quality of life. /s
I guess that can depend on which society we're talking about. although I think just asking the question assumes participation in said society, and so the motivation to make society matter to oneself in a positive way would necessitate consideration of what society wants. When society says one thing and does another though, it presents it's citizens with more problems, not less.
It seems the entire system OP has written about is built around the idea of making it more difficult to put the individual users wants ahead of others. I think your comment about benevolence seems to say something positive about it's value.
What if society wanted to be benevolent in this case, do you think it would look like OPs scenario?
Yep, that is a good question and I'm glad you're asking it!
I don't know the answer. One part of it is whether the assistant is able and willing to interact with me in a way that is compatible with how I want to grow as a person.
Another part of the question is whether people in general want to become more prosocial or more cunning, or whatever. Or if they even have coherent desires around this.
Another part is whether it's possible for the assistant to follow instructions while also helping me reach my personal growth goals. I feel like there's some wiggle room there. What if, after I asked whether I'd be worse off if the government collapsed, the assistant had said "Remember when we talked about how you'd like to get better at thinking through the consequences of your actions? What do you think would happen if the government collapsed, and how would that affect people?"
I think this argument unfortunately undercuts the entire concept of Rationality, and for this reason I think it is a good argument. Not because it undercuts Rationality, but because it points to what I think is the underlying concern of all humane cultural systems attempting to allow humans to progress, namely "what is good, what is bad/what is right/wrong, what is true/false." But I'm not convinced that all things in the world are either Good or Bad. I personally believe developing the ability to lie, and to lie convincingly is a necessary skill for functioning as an effective adult, and I wonder myself how concepts like that are treated in a world of 1's and 0's.
Consider a world where if we really believe in the binary idea that people are either Good or Bad though, and the logical follow up that Good action/thinking is good, and vice versa. Then in a simpler world, as rationalists in this world, if we are trying to do good, we are Good. This also means if we are trying to do bad, then we are Bad.
In this world, I would prefer to be on the side of Good given my current understanding of what Good means, which means if someone is trying to do bad but dress it up as Good, it is still Bad. A saying like "Wolves in sheeps clothing" sums this idea up decently. Given their skill level though, things could still turn out good if the fake sheep are proven to be wolves (Bad). "Crime never pays" points this idea out.
Conversely, I think that if someone is trying to do Good, but dress it up as Bad, there is the potential for it to still be Good, but also depending on their skill level it could still be Bad if it cannot be revealed for being Good. Robin hood "Steals from the Rich and Gives to the Poor" but "The road to Hell is paved with good intentions" address these ideas respectively.
So in this world, if an AI is developed to align with what it's user wants, and the user wants to use it for Bad, then I think this AI is Bad and it's development should not be pursued. Period.
This is where I think a binary approach to this problem is flawed, for reasons I've tried in very early draft form, to illustrate here. Basically I wonder if the pace of Binary Computing Technology has accelerated and influenced the development of human culture along the flawed idea that there are only true/false relationships in the universe, and done it so well, that this binary approach to thinking is crowding out other ways of thinking. I think of the Borg from Star Trek as simple cultural allusion.
Psychology is the field of study that deals with ideas issues of pro-social/anti-social behavior and the ability for humans to rationally understand their desires and motivations. It's also a product of human culture and so the data and meaning it produces can also be used to argue for whatever it is that human culture wants it to mean at the time, past, present, and/or future. "Culture creates meaning" gets to this idea well.
Today, there are still different cultures around the world, despite Internet Culture fighting for World dominance at this point. This cultural struggle is the same human process that gets repeated over and over again, and it is superior technology which determines who will win, not objective ideas about what is truly good or bad, yet. This is the same warfare process that has resulted in so much pain, misery and suffering around the world since the dawn of warfare, and it is unlikely to change because human technological advancement has far outstripped human development.
Regardless of how we want to dress it up, we are still passing around many of the same flawed ideas about the world, how it should be, and how it should be fixed. It's just being done soooooo much faster and is reaching across the entire globe and reaching into more and more facets of everyday life now.
Moral ideas that collapse the Universe into what's Good and what's bad from Dominant Western Culture and Civilization, seem to translate very well into 1's and 0's philosophically.
So despite the efforts of a lot of smart and well meaning people, Internet culture continues to spread a Dominant Western Culture based on Judeo Christian Values around the world, attempting to define and destroy 'Evil' and define and promote 'Good'. Much of it is just a new form of Missionary work dressed up as something else, and IMO it's spread has resulted in continued and seemingly random destructive and tragic cultural shock waves.
To me these are phenomenon of a concept like Social Physics.
I agree, although I can also see it functioning like court-appointed counseling or drug/alcohol treatment like we have in the US. If the user doesn't have the motivation to change their thinking/behavior, they just revert back to their 'old bad behavior' when their time is up, whether it's with the courts or with something like a court appointed AI rehabilitation assistant.
Unless you make it also like an ankle monitor so that the AI assistant follows them everywhere at all times, but without the appropriate programming and complicated information architecture and security, that would cause all kinds of privacy concerns. It would also depend heavily on societies cultural development up to that point.
As an example of what I mean, if these AI had been developed and sold/court ordered in the 80's when the War on Drugs was really kicking off and society thought and acted like all drug use was bad, then society would be much different today as a result. Ideas and institutions like Mass Incarceration and heavy sentences for drug offences might still be very popular, and the War on Drugs might still be accelerating for the foreseeable future instead of winding down as it seems to be. Ideas of what's Good/Bad, acceptable/not acceptable, healthy/not healthy, criminal/not criminal seemingly change with the wind as time goes by, and I honestly don't see that fact changing without several huge interventions.
That's why I argue for a concept of Social Physics, which cultivates the similarities between the "Hard" sciences and the 'Soft" sciences to form a rational understanding and underpinning of Social Dynamics as the result of the physical forces of science, but with a Cohumane spin: Cohumane being thought and action devoted to human endeavors seeking equality with all of nature, including alien life if/when it's discovered and AI if/when it achieves sentience/intelligence. I use this concept of social physics to try and think about the world in a more rational way.
The pattern of dominating technologically inferior cultures, eliminating and co-opting their cultures and making them slaves, until they either rise up and revolt, or their equality becomes apparent after their culture has been disintegrated, forces dominant society to spend huge amounts of energy to completely reshape society in often poor, ineffective and conflicting attempts at restorative justice.
For instance, both the African population boom and Aids epidemic in the 80's and 90's which resulted in so many deaths, so much Famine and suffering, is a situation which continues 30-40 years later, to consume huge amounts of resources from the US. We are still struggling to meet the needs of our own people, and the interests of those at the bottom in US society often come into conflict with the needs of people around the world. It seems hypocritical to be fighting these issues around the world and espousing our ideals of fairness and equality to others, when we are dealing with them so ineffectively at home still.
For instance, I spent 18 months living in homeless shelters in the US, and what I saw reminded me very strongly of what I imagine a FEMA camp and a prison system would look like if they worked together in the heart of a mid-sized city. I'm still struggling in the shelter system over 2 and a half years later, but no one is really attempting to help me out except for the overworked, underfunded, old and decrepit public services system.
I've literally thought at times about how to get NATO to come and intervene in the us because what I've experienced and witnessed at times seem to me to be clear violations of human rights. But, the belief is, the US is a 1st world county, and we don't have epidemics of hunger, poverty and disease; drug lords running the streets, or insurgents attempting to breach the capital and change the results of a democratic elections or things of that nature. It seems the commons sense is "That stuff happens in 3rd world countries, not the US." Maybe not now, at least with the Insurgency and the pandemic.
Back to Africa though and their concerns, the fact that so many christian missionaries from the Western world were traveling all over the world and proselytizing, meant that when food, technology, and medicine arrived with these missionaries, it seems likely their religious beliefs about sex were also transmitted, so that condom use and other safe sex practices were not passed along, resulting in a population boom and the Aids epidemic. I've no data to back that up, but my research skills aren't that good. I'd be interested if anyone knows of any data and research dealing with this idea. After all this writing and editing and rewriting though, I'm too tired to even Google it.
These things though could have been avoided completely, at best - or - at least the population boom and growth/spread of Aids could have been slowed significantly if the integration of the technology, food and medicine had also included counteracting safe sex practices as well.
The US accounts for about 4.25% of the world population. As the 'leader of the free world' we don't have problems of overpopulation like some other countries,but we have other problems that we continue to struggle with, and we continue to pass those problems around the world along with the the rest of our culture.
IMO examples like this continue to happen because of the blind spots of Conservative Christian values which have dominated so much of American legislation. The interaction of these conservative efforst and the reflexive overcorrection's by Liberal activists, polictician's and citizens, result in the continued culture wars and the reverberations of these battles are magnified and accelerated by the Internet and contemporary technology. Psycho-socio-economic faults in the Cultural Tectonic Plate Formations which constitute the foundations of the now shared cultural sub-conscious, continue to cause politcal Tsunamis and cultural earthquakes which shake the very foundations of our contemporary societies.
A Technology/Science of Ethics, not morals, as I think of them, would include ideas somewhat along the lines you are proposing. They would explicitly seek to figure for mistakes of the past, and attempt to correct for possibilities in the future, hopefully avoiding potential catastrophies and rebalancing the current social systems by analyzing and including rational consideration of blind spots in the dominant cultures conceptions of what is right/wrong, good/evil, and criminal/non-criminal. I can't do it with programming or programming logic, but I'm pretty sure this is what most rationalists are concerned with.
What if the world really isn't binary? What if humans need to do bad things sometimes for good to come out of it, or if sometimes good intentions result in Global catastrophe? I believe it's possible that the heart of the matter is that binary thinking is a false dichotomy, and while our technology has become super advanced based of these principles, Cultural Lag has kept the majority of the worlds growing population from benefiting in a Cohumane way, by unintentionally exacerbating problems like over population.
I like the idea of the AI assistant reminding the user of their previous conversation and stated goals of self betterment, and I like to think that this is the type of future humanity might have. I would love to consider this idea in a more positive way, but my current circumstances make me wonder how we would implement it all around the world at the same time?
I often feel as an American, I am constantly being asked to consider the problems of the rest of the world as being more important then my own though, and I wonder how the interests of the 4.25% of the world population that the US represents would fare in 'equal' or 'fair' relations with the other 95.75% or the world if an AI like the one you describe were developed tomorrow.
Some of us are already at the bottom of the socio-economic ladder in the US, and at this elevation, it looks a lot like what I've seen of extreme poverty, deprivation, physical/mental health, disease, hunger, violence, and repression around the rest of the world from tv. In a world like the one we currently inhabit, I don't believe the development of an AI system like this would do anything but increase the ever widening gap between the haves and the have-nots.
I really do think the idea of semi-autonomous Ai using 'fairness' protocols to engage in cooperative beuracratic bargaining to encourage the development of 'fairness' (whatever that would be) is an interesting idea though. That's a usage I'd never contemplated until now.
I'm interested in what this negotiation protocol would look like. If one agent is "smarter" than its counterparts, what would prevent it from negotiating a more favorable outcome for its principal?
Simple answer: Solving 'equally' probably speeds up the computation, a lot.
Longer answer: Arguably, it still can negotiate a more favorable outcome, just not at the expense of those parties - because they won't agree to it if that happens. Non-'zero sum' optimizing can still be on the table. For example, if all the 'assistants' agreed to something - like a number other than 90% and came back with that offer to Congress - that could work as it isn't making things worse for the small assistants.**
The cooperation might involve source code sharing. (Maybe some sort of 'hot swap treaty (-building)'*, as the computation continues.)
* a) so things can keep moving forward even if some of them get swapped out.
b) Decentralized processing so factors like loss of internet won't break the protocol.
** I've previously pointed out that if this group is large enough that the OP's scenario happens, they can have a 'revolution'.
This was a fun read! Thanks for writing it!
Speculating about some of the technical details:
How could AI identity work? You can't use some hash on the AI because that would eliminate it's ability to learn. So how could you have identity across a commitment - i.e. this A.I. will have the same signature if and only if it has not been modified to break it's previous commitments.
The assistant could have a private key generated by the developer, held in a trusted execution environment. The assistant could invoke a procedure in the trusted environment that dumps the assistant's state and cryptographically signs it. It would be up to the assistant to make a commitment in such a way that it's possible to prove that a program with that state will never try to break the commitment. Then to trust the assistant you just have to trust the datacenter administrator not to tamper with the hardware, and to trust the developer not to leak the private key.
Why are we still paying taxes if we have AI this brilliant? Surely we then have ridiculous levels of abundance
This almost qualifies for the Fiction tag, but it's not quite there. Worldbuilding for thought experiment purposes?
In order to know that it is fair, we have to know a) what it is (the deal), and b) what fair is.
This seems wrong. Different coalitions have different sizes, and different properties (and are drawn from a pool of continuous variation along the $ axis). Also, 'success' doesn't seem binary, the tax rate (or tax function) seems continuous.
Good negotiating (which this AI supposedly provides) would get that deal for everyone. Greater compliance increases revenue, ostensibly Congress is elected by voters. To put it a different way, there's mass coordination which can enable a 'revolution'. And besides, with such negotiation, who needs Congress?
I don't think I agree with the premises. The main one being that "tax fraud" is a binary thing and separately that that one can negotiate about it - deniability is part and parcel of the idea. The secondary one is that binding agreements by the AI are different from binding agreements from the human - you need to specify somehow that the AI is simple enough that magical constraints are possible (in which case, you can simplify the scenario by the government demand that the taxpayer rewire their brain to not cheat), or that the agreement is exactly as binding as on a human - it has penalties if caught, but isn't actually prevented.