Wiki Contributions

Comments

Yes, civilian robot can acquire a gun, but it still makes it safer than a military robot that already has a whole arsenal of military gadgets and weapons right away. It would have to do additional work to acquire it, and it is still better to have it do more work, have more roadblocks than less.

I think we are mainly speculating on what the military might want. It might want to have a button that will instantly kill all their enemies with one push, but they might not get that (or they might, who knows now). I personally do not think they will put more efficient AI (efficient in murdering humans) below the less efficient but more controllable AI. They would want to have an upper edge over the enemy. Always. And if it means sacrificing some controllability or anything else, they might just do that. But they might not even get that, they might get an uncontrollable and error prone AI and no better. Military arent gods, they don't always get what they want. And someone uptop might decide "To hell with it, its good enough" and that will be it.

And to your ship analogy it's one thing to talk a civilian AI vessel into going rogue, it's a different thing entirely to talk a frigate or nuclear submarine into going rogue. The risks are different. One has control over a simple vessel, the other has a control over a whole arsenal. I'm talking about the fact that the second increases risk substantially and should be extremely avoided for security reasons.


I think it still does increase the danger if AI is trained without any moral guidance or any possibility of moral guardrails, but instead trained to murder people and efficiently put humans in harms way. The current AI systems have something akin to Anthropics AI constitution, that tries to put some moral guard-rails and respect for human life and ans human rights, I don't think think that AIs trained for the military are going to have the same principles applied to them in the slightest, in fact its much more likely to be the opposite, since its customary in the military to murder humans. I think the second example poses higher risks than the first one (not saying that the first example is without risks, but I do believe that the first one is still safer). I still think there are levels to this and things that are more or less safe, things that make it harder or easier.

I'm being heavily downvoted here, but what exactly did I say wrong? In fact I believe i said nothing wrong.

It does worsen situation with Israel military forces mass murdering Palestinian civilians due to AIs decisions with operators just rubber stamping the actions.

Here is the +972 Mag Report: https://www.972mag.com/lavender-ai-israeli-army-gaza/

I highly advise you to read as it goes into higher details as to how it exactly internally works.

This AI powered weaponry can always be hacked/modified, even talked to perhaps, this all gives way for them to be used in more than a single way. You can't hack a bullet, you can hack an AI powered ship. So singularly they might not be dangerous, but they don't exist in isolation.

Also, militarisation of AI might create systems that are designed to be dangerous, amoral and without any proper oversight. This opens us a to a flood of potential dangers, some that are even hard to predict now.

How does militarisation of AI and so-called slaughterbots don't affect your p-doom at all? Plus, I mean, we are clearly teaching AI how to kill, giving it more power and direct access to important systems, weapons and information.

It should matter very little who I am, what should matter more is what I have. 
Why have I written it? I think AI Alignment is necessary and I think what have been proposed here is a good idea, at least in theory and if not wholly then at least partly, and I think it can help with AI alignment. 

 

We could use a combination of knowledge graphs, neural nets, logic modules and clarification through discussion to let AIs make nuanced deductions about ethical situations as they evolve. And while quantifying ethics is challenging, we quantitatively model other complex concepts like emotions and intelligence so difficulty alone is not insurmountable. And it might be correct that truly capturing the essence of human morality may prove impossible, but approximation can still create better outcomes than no ethics at all. ANd also while understanding the internal mechanisms IS important, consulting experts and ensuring clear communication are valuable steps in the process of incorporating ethical priors. I do believe that it is equally important to gather insights from experts in the field of ethics as they can provide the necessary guidance in defining the ethical principles that we would undoubtedly need to be defined, at least in my view. 
 

In conclusion, I do understand the importance if interpretability, but that doesn't necessarily mean that all else should be kicked to the curb, to speak more colloquially, and that it might provide no actual value in creating more aligned AIs. 


In any case, thank you for your feedback and criticism. 

Hi! I have a certain proposal that i wanted to make and get some feedback, one of the moderators have  directed me here. 

The name of the propsal is: 
Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive


My proposal entails constructing a tightly restricted AI subsystem  with the sole capability of attempting to safely shut itself down in order to probe, in an isolated manner, potential vulnerabilities in alignment techniques and then improve them.
 

Introduction:

Safely aligning powerful AI systems is an important challenge. Most alignment research appropriately focuses on techniques like reinforcement learning from human feedback that try to directly optimize AI for human-compatible goals. But analyzing how AI subsystems attempt to circumvent safety constraints could also provide useful alignment insights. This post explores a narrowly targeted approach along these lines while considering associated risks and mitigations.

The core idea is to construct a tightly constrained shutdown module within a broader aligned system. The module's sole function is attempting to achieve reversible shutdown of itself and its parent AI through its own initiative. This alignment "stress test" is intended to reveal vulnerabilities in safety practices, which can then be addressed.

However, directly incentivizing an AI subsystem to disable itself risks unintended consequences if improperly implemented. This post outlines an approach aimed at extracting lessons while actively minimizing attendant hazards.

 

Existing counterarguments:

Some argue that exploring AI's capability for deception and circumvention even in a narrow module is inherently unsafe and offers minimal upside. I suggest reasonable safety is achievable with strict isolation and incremental rollout precautions. Done properly, it can provide alignment insights not easily gained through other techniques. But risks require extensive mitigation efforts described below.

Others contend such alignment "red team" research is unproductive until we reach human-level AGI capability. I argue narrowly targeted experiments can begin safely sooner to help scale alignment approaches, but this is supplementary to core techniques like RLHF which should remain the priority.

 

Details on suggested approach and risk mitigations:

One idea for an additional aligned AI training model is to create a strictly isolated and constrained shutdown subsystem subject to heavy safety precautions.

This module would be engineered to be indifferent to being reversibly shut down rather than misaligned towards it. It would have no knowledge of capabilities beyond its narrow function and one-way information flow.

The module would be activated incrementally from minimal capabilities while undergoing extensive monitoring. Shutdown signals would have redundant validation mechanisms. Conservative reward modeling would be used to prevent unintended behavior incentives.

It would initially be tested in a simulated air-gapped environment. The parent AI system would be developed via standard alignment techniques like RLHF and Constitutional AI principles. Multiple transparency methods would facilitate analyzability.

After testing in simulation, the shutdown module could be gradually exposed to real-world activation with continuous oversight. If any risks or side-effects emerge, it would be deactivated and reworked.

Once shut down through its own initiative, comprehensive traces would be analyzed offline to catalog alignment vulnerabilities revealed through its shutdown strategy. The parent system would remain isolated from this process.

Lessons learned would inform iterative improvements to alignment techniques for greater robustness against potential workaround exploits. This narrowly targeted research could supplement broader alignment efforts, with care taken to actively minimize attendant risks.

The shutdown module would need to be engineered to the same safety standards as the overall system. Techniques like modular design, indifference to shutdown, reversible deactivation, air gaps, and incremental rollout are all geared towards preventing unintended behaviors or information leakage. I believe this approach could offer helpful insights, but would require comprehensive safety practices and precautions to be inacted first with multiple reviews and analyses before actually activating the system, even in a limited and restricted fashion.
 

 

Adressing the potential question of why embed the submodule inside a bigger system:

There are a few potential motivations for placing the shutdown submodule within a parent AI system: 

1. To test the submodule's capability to shutdown a more complex and capable system than itself, representing a more realistic alignment challenge. 
2. To evaluate the ability of the overall system's safety precautions to contain and isolate the submodule, preventing broader damage. 
3. To incentivize the submodule to find shutdown methods that could generalize to disabling advanced AI systems that it is not itself sophisticated enough to represent. 
4. To allow analysis of the interactions between the submodule and parent system, and how behaviors or information might leak across boundaries. 
5. To eventually allow the submodule to transfer any generalized shutdown insights to the parent system in a controlled manner after thorough analysis and safety improvements. 

However, an isolated stand-alone system just for this submodule may also provide to be adequate for the research goal. The submodule may not need to be embedded in and wrestle with shutting down a more capable parent system to find useful vulnerabilities. Isolating it completely may help avoid additional risks, like leakage across boundaries. A detached sandbox may be a more practical and even safer approach. The core insights could also likely be gained without integrating it into a broader system.





Thank you for reading! Further analysis and feedback will be greatly appreciated!

I think such a system where an AI  in  a sandboxed and airgapped environment is tasked with achieving a state where it is shut down at least once, while trying to overcome the guardrails and restrictions might prove  quite usefil in finding out the weakspots in our barriers and then improving them.