If I parse things right, the initial state is something like 1/3 “I’m Luigi” 1/3 “I’m bowser” and 1/3 “I’m waluigi”, and the RLHF eliminates the bowser belief while having no effect on the other beliefs.
If this is based on the narrative prediction trained off of a large number of narrative characters with opposing traits, do all the related jailbreaking methods utterly fail when used on an AI that was trained on a source set that doesn’t include fictional plot lines line that?
My general reply is “if you think you’re spending too much time at the airport now, try missing a connecting flight”.
Different airports vary greatly in how much it sucks to unexpectedly spend the night in the terminal.
My experience with that behavior has been:
1: have a desired outcome in mind.
2: consider the largest visible difference between that outcome and the currently expected one
3: propose a change that is expected to alter the world in a way that results in that difference no longer being visible
4: if there are still glaring visible differences between the expected future and the desired one, iterate until there are no visible differences.
For example, people who see homeless encampments in public parks wish that they were not reminded of income inequality. They propose courses of action which assign blame to individual homeless people for the lack of housing, justifying forcibly removing them from the public areas. Those courses of action are expected to make the world look (to the people making the proposals) exactly like one in which there is no income inequality, therefore they implement sweeps.
In many political spheres, solving problems by making everyone shut up about them (and making non-issues problems by getting people to continuously mention them) actually works.
It seems like the core issue underlying all of these specific examples is that “gather more information about the expected outcome and seek additional options” choice isn’t considered.
Sometimes the price gouging actually is someone who is making an obscene profit even considering their expenses. Price-fixing in that specific case can just be the socially desired outcome, but the policy maker has to have detailed information about the specifics of that specific case.
So far the idea that an embryo will become immortal if it exits the womb alive has been taken as an article of faith by people who claim that abortion is bad because it results in a death; if the choice is examined as including an option where the death is preceded by little suffering and an option where the death is preceded by an expected lot of suffering and little redeeming quality, the position that abortion is bad because death is bad loses all basis.
If someone is drowning (literally or metaphorically) and you don’t consider the option of calling a trained lifeguard or other person more competent or better equipped than you,, you haven’t considered all of the easily available options.
If you only consider the complete social pressure against ever blocking anyone for any reason or the policy of blocking people who are persuasive about things that you refuse to allow people to be swayed about, you completely miss the possible option of only blocking and banning people who are actually toxic or harmful.
And if you are raising a child who wants to hang out with someone harmful, like a white supremacist or someone who talks in the theater, your options include allowing closely supervised activities, not just blanket permission and blanket refusal. You can bribe them to practice piano (although that also probably won’t get you what you think you want), or you can try to identify what the aversive thing about piano practice is and figure out an option that addresses it directly- if the problem is that they are frustrated that their skill is increasing slowly, setting better expectations about what skill growth looks and feels like could completely resolve the aversion to practice (but that’s just me taking five actual minutes to think about the generic class of problem; anyone actually experiencing a specific problem of that class should be able to develop several different intentions of action based on their actual situation).
One of the core problems informing the entire process is that “spend some resources (time, money, attention) to get more options or more information about the options” is often not considered to be one of the options, and that course of action is declined not because the cost-benefit ratio is too low, but because it was not considered at all.
Are there better tools for figuring out how much you value things?
Is that also a reason to oppose every other advance?
Why are you characterizing "contraindicated cancer screening" as "healthcare"? In either case, it's not central to the issue where rural specialists have two-month waits for appointments and four-hour waits from the appointment time.
You said "More healtchare isn't always better".
Can you give a central example about a situation where more people receiving healthcare is worse, and why we should characterize that situation as one where more people receive healthcare?
If the government restricts the supply of meat (and food generally is adequately distributed), then the finite supply of meat makes it positional, and the fact that all needs are being met (within the scope of the example, at least) makes the outcome satisfactory.
If you thought that I intended some element of "maximize production even if all needs have already been met", then we have completely failed to communicate.