alex.herwix22d30

We train an LLM to be an expert on AI design and wisdom. We might do this by feeding it AI research papers and "wisdom texts", like principled arguments about wise behavior and stories of people behaving wisely, over and above those base models already have access to, and then fine tuning to prioritize giving wise responses.
We simultaneously train some AI safety researchers to be wiser.
Our wise AI safety researchers use this LLM as an assistant to help them think through how to design a superintelligent AI that would embody the wisdom necessary to be safe.
Iterate as necessary, using wisdom and understanding developed with the use of less wise AI to train more wise AI.

First, I wanted to dismiss this as not addressing the problem at all but on second thought, I think a key insight here may be that adding a focus on improving the wisdom of relevant parties involved in AI development could help to bootstrap more trustworthy "alignment verification" capacities.

However, I am not sure that something like this would fly in our economically oriented societies since I would expect that wiser people would decline to develop super-intelligent AI for the foreseeable future and rather urge us to look inward as the space to look for solutions to most of our problems (almost all of our problems are man-made after all). Having said this, if we were to get a regime in place that could reliably ensure that "wisdom" plays a key role in decision making around AI development, this seems like good a bet as any to help us deal with our predicament.

Finding the Wisdom to Build Safe AI

alex.herwix22d10

If we can train AI to be wise, it would imply an ability to automate training, because if we can train a wise AI, then in theory that AI could train other AIs to be wise in the same way wise humans are able to train other humans to be wise. We would only need to train a single wise AI in such a scheme who could pass on wisdom to other AIs.

I think this is way too optimistic. Having trained a wise person or AI once does not mean that we have fully understood what we have done to get there, which limits our ability to reproduce it. One can maybe make the argument that in the context of fully reproducible AI training pipelines recreation may be possible or that a wise AI could be copied but we shouldn't simply assume this. The world is super complex and always in motion. Nothing is permanent. What has worked in one context may not always work in an other context. Agents which were considered wise at some point may not be at another or agents which have actually been wise in hindsight may not be recognized as such at the time.

In addition, producing one wise AI does not necessarily imply that this wise AI can effectively pass on wisdom at the required scale. It may have a better chance than non-wise AIs but we shouldn't take success as a given, if all we have managed is to produce one wise AI. There are many forces at play here that could subvert or overcome such efforts, in particular in race situations.

My gut feeling is that transmission of wisdom is somewhat of a coordination game that depends on enclaves of relatively wise minds cross checking, challenging, and supporting each other (i.e., Thich Nhat Hanh's “the next Buddha will be a Sangha”). Following this line of logic, the unit of analysis should be the collective or even ecology of minds and practices rather than the "single" wise AI. I acknowledge that this is more of an epistemic rather than ontological distinction (e.g., one could also think of a complex mind as a collective as in IFS) but I think it's key to unpack the structure of wisdom and how it comes about rather than thinking of it as "simply" a nebulous trait that can and needs to be copied.

Rawls's Veil of Ignorance Doesn't Make Any Sense

alex.herwix5mo22

To be honest, I am pretty confused by your argument and I tried to express one of those confusions with my reply. I think you probably also got what I wanted to express but chose to ignore the content in favor of patronizing me. As I don't want to continue to go down this road, here is a more elaborate comment that explains where I am coming from:

First, you again make a sweeping claim that you do not really justify: "Many (perhaps most) famous "highly recognized" philosophical arguments are nonsensical". What is your ground for this claim? Do you mean that it is self-evident that much (perhaps most) of philosophy is bullshit? Or do you have a more nuanced understanding of nonsensical? Are you referring to Wittgenstein here?

Then you position this unjustified claim as a general prior to justify that your own position in a particular situation is much more likely to be valid than the alternative. Doesn't that seem a little bit like cherry picking to you?

My critique of the post and your comments boils down to the fact that both are very quick to dismiss other positions as nonsensical and by doing so claim their own perspective/position to be superior. This is problematic because although certain positions may seem nonsensical to you, they may make perfect sense from another angle. While this problem cannot be solved in principle, in practice it calls for investing at least some effort and resources into recognizing potentially interesting/valid perspectives and, in particular, staying open minded to the recognition that one may not have consider all relevant aspects and to reorient accordingly. I will list a couple of resources that you can check out if you are interested in a more elaborate argument on this matter.

* Stegmaier, W. (2019). What Is Orientation? A Philosophical Investigation. De Gruyter.
* Ulrich, W. (2000). Reflective Practice in the Civil Society: The contribution of critically systemic thinking. Reflective Practice, 1(2), 247–268. https://doi.org/10.1080/713693151
* Ulrich, W., & Reynolds, M. (2010). Critical Systems Heuristics. In M. Reynolds & S. Holwell (Eds.), Systems Approaches to Managing Change: A Practical Guide (pp. 243–292). Springer London. https://doi.org/10.1007/978-1-84882-809-4_6

Rawls's Veil of Ignorance Doesn't Make Any Sense

alex.herwix5mo-40

Since a lot of arguments on internet forums are nonsensical, the fact that your comment doesn’t makes sense to me, means that it is far more likely that it doesn’t make sense at all than it is that I am missing something.

That’s pretty ironic.

Rawls's Veil of Ignorance Doesn't Make Any Sense

alex.herwix5mo-3-13

I downvoted this post because the whole set up is straw manning Rawls work. To claim that a highly recognized philosophical treatment of justice that has inspired countless discussions and professional philosophers doesn’t “make any sense” is an extraordinary claim that should ideally be backed by a detailed argument and evidence. However, to me the post seems handwavey and more like armchair philosophizing than detailed engagement. Don’t get me wrong, feel free to do that but please make clear that this is what you are doing.

Regarding your claim that the veil of ignorance doesn’t map to decision making in reality, that’s obvious. But that’s also not the point of this thought experiment. It’s about how to approach the ideal of justice and not how to ultimately implement it in our non-ideal world. One can debate the merits of talking and thinking about ideals but calling it “senseless” without some deeper engagement seems pretty harsh.

The Ideal Speech Situation as a Tool for AI Ethical Reflection: A Framework for Alignment

alex.herwix6mo10

Hey Kenneth,

thanks for sharing your thoughts. I don't have much to say about the specifics of your post because I find it somewhat difficult to understand how exactly you want an AI (what kind of AI?) to internalize ethical reflection and what benefit the concept of the ideal speech situation (ISS) has here.

What I do know is that the ISS has often been characterized as an "unpractical" concept that cannot be put into practice because the ideal it seeks simply cannot be realized (e.g., Ulrich, 1987, 2003). This may be something to consider or dive deeper into to see if this affects your proposal. I personally like the work of Werner Ulrich on this matter, which has heavily inspired my phd thesis on a related topic. I put one of the papers from the thesis in the reference section. Feel free to reach out via PM if you want to discuss this further.

References

Herwix, A. (2023). Threading the Needle in the Digital Age: Four Paradigmatic Challenges for Responsible Design Science Research. SocArXiv. https://doi.org/10.31235/osf.io/xd423

Ulrich, W. (1987). Critical heuristics of social systems design. European Journal of Operational Research, 31(3), 276–283.

Ulrich, W. (1994). Can We Secure Future-Responsive Management Through Systems Thinking and Design? Interfaces, 24(4), 26–37. https://doi.org/10.1287/inte.24.4.26

Ulrich, W. (2003). Beyond methodology choice: Critical systems thinking as critically systemic discourse. Journal of the Operational Research Society, 54(4), 325–342. https://doi.org/10.1057/palgrave.jors.2601518

Ulrich, W. (2007). Philosophy for professionals: Towards critical pragmatism. Journal of the Operational Research Society, 58(8), 1109–1113. https://doi.org/10.1057/palgrave.jors.2602336

An Invitation to Refrain from Downvoting Posts into Net-Negative Karma

alex.herwix6mo10

I see your point regarding different results depending on order of how people see the post but that’s also true the other way around. Given the assumption that less people are likely to view a post that has negative Karma, people who may actually turn out to like the post and upvote it never do so because of preexisting negative votes.

In fact, I think that’s the whole point of this scheme, isn’t it?

So, either way you never capture an „accurate“ picture because the signal itself is distorting the outcome. The key question is then what outcome one prefers, neither is objectively „right“ or in all respects „better“.

I personally think that downvoting into negative karma is an unproductive practice, in particular with new posts because it stifles debate about potentially interesting topics. If you are bothered enough to downvote there should often be something to the post that is controversial.

Take this post as an example. When I found it a couple of hours after posting, it was already downvoted into negative karma but there is no obvious reason why this should be so. It’s well written and makes a clear point that‘s worth discussing as exemplified by our engagement. Because it’s negative karma, however fewer people are likely to weight in to the debate because the signal is telling them to not bother engaging with this.

In general my suggestion would be to only downvote into negative karma if you can be bothered to explain and defend your downvote in a comment and are willing to take it back if the author if the author of the post gives a reasonable reply.

But as I said, this is just one way of looking at this. I value discourse and critical debate as essential pieces to sense and meaning making and believe that I made a reasonable argument for how this is stifled by current practice.

Thanks to the author of the post for his thoughtful invite for critical reflection!

Thoughts on open source AI

alex.herwix9mo10

I think this is a very contextual question that really depends on the design of the mechanisms involved. For example, if we are talking about high risk use cases the military could be involved as part of the regulatory regime. It’s really a question of how you set this up, the possible design space is huge if we look at this with an open mind. This is why I am advocating for engaging more deeply with the options we have here.

Thoughts on open source AI

alex.herwix9mo3-1

I just wanted to highlight that there also seems to be an opportunity to combine the best traits of open and closed source licensing models in the form of a new regulatory regime that one could call: regulated source.

I tried to start a discussion about this possibility but so far the take up has been limited. I think that’s a shame, there seems to be so much that could be gained by “outside the box” thinking on this issue since the alternatives both seem pretty bleak.

What is to be done? (About the profit motive)

alex.herwix11mo31

That seems to downplay the fact that we will never be able to internalize all externalities simply because we cannot reliably anticipate all of them. So you are always playing catch up to some degree.

Also simply declaring an issue “generally” resolved when the current state of the world demonstrates it’s actually not resolved seems premature in my book. Breaking out of established paradigms is generally the best way to make rapid progress on vexing issues. Why would you want to close the door to this?

LESSWRONG
LW

Posts

Wiki Contributions

Comments

References