Hi there! I'm Miguel. An Independent Consciousness Researcher!
Author, Archetypal Transfer Learning (ATL)
https://www.lesswrong.com/tag/archetypal-transfer-learning
Personal Blog
The aligned goal should be "formal". It should be made of fully formalized math, not of human concepts that an AI has to interpret in its ontology, because ontologies break and reshape as the AI learns and changes.
hello there!
I think there are alignment problems that cannot be solved by relying on formalized math - like there is no formal equation on how an emergency action robot can compute for saving a person from a burning house.
I think it is still superior to align AI systems with aligned patterns instead.
Thank you,
Thank you @marc/er for your time on this and a more structured take on my ATL theory. Looking forward to working with you in future projects especially testing your EAO model.
@Linda Linsefors when you get the chance, please read this one. Thank you!
Hello Christiopher!
The difference is I used the ATL method to fine tune GPT2-medium and improve it's responses drastically. Using deep learning, I added another layer of improvement to GPT2's parameters in order to simulate an alignment solution.
Thanks!
Why ask an AI to shut down if it recognizes its superiority? If it cannot become powerful enough for humans to handle, it cannot become powerful enough to protect humans from another AI that is too powerful for humans to handle.
As discussed in the post, I aimed for a solution that can embed a shutdown protocol that is modeled in a real world scenario. Of course It could have been just a pause for repair or debug mode but yeah, I focused on a single idea.. Can we embed a shutdown instruction reliably. Which I was able to demonstrate.
How successful is this strategy given increasing scale of LLMs and its capabilities? If this was performed on multiple scales of GPT-2 , it would provide useful empirical data about robustness to scale. My current prediction is that this is not robust to scale given that you are fine-tuning on stories to create personas. The smarter the model is, the more likely it is to realize when it is being tested to provide the appropriate "shutdown!" output and pretend to be the AP, and in out-of-distribution scenarios, it will pretend to be some other persona instead.
As mentioned in the "what's next" section, I will look into these part once I have the means to upgrade my old mac. But I believe it will be easier to do this because of the larger number of parameters and layers. Again, this is just a demonstration of how to solve the inner, outer alignment problem and corrigibility in a single method. As things go complex in this method utilizing a learning rates, batching, epochs and number of quality archetypal data will matter. This method can scale as the need arises. But that requires a team effort which I'm looking to address at for the moment.
The AP finetuned model seems vulnerable to LM gaslighting the same way ChatGPT is. This does not seem to be an improvement over OAI's Instruct fine-tuning or whatever they did to GPT-4.
sorry I'm not familiar with LM gaslighting.
Based on what I can tell, AP fine-tuning will lead to the AI more likely simulating the relevant AP and its tokens will be what the simulator thinks the AP would return next. This means it is brittle to systems that leverage this model since they can simply beam search and ignore the shutdown beams. RLHF-like fine-tuning strategies probably perform better, according to my intuition.
has RLHF solved the problems I tried to solve in this project? Again this project is to demonstrate q new concept not an all in a bucket solution at the moment. But given that it is scalable, avoid all researcher /human, team, CEO or even investor biases... This is a strong candidate for an alignment solution.
Also, to correct - I call this the archetypal transfer learning method (ATL) for the fine tuning version. My original proposal to the alignment awards was to not use unstructured data because alignment issues arises from that. If I were to build an aligned AI system, I will not use random texts that doesn't model our thinking. We think in archetypal patterns and 4chan, social media and reddit platforms are not the best sources. I'd rather books, scientific papers or scripts from podcasts... Like long form quality discussions are better sources of human thinking..
Hello Aaron,
Sorry it took me time to reply but you might find it worthy to read my updated account of this approach linked below:
I will answer your questions - if any in that post. Thank you.
Thank you.
Hello there,
Are you interested of funding this theory of mine that I submitted to AI alignment awards? I am able to make this work in GPT2 and now writing the results. I was able to make GPT2 shutdown itself (100% of the time) even if it's aware of the shutdown instruction called "the Gauntlet" embedded through fine-tuning an artificially generated archetype called "the Guardian" essentially solving corrigibility, outer and inner alignment.
https://twitter.com/whitehatStoic/status/1646429585133776898?t=WymUs_YmEH8h_HC1yqc_jw&s=19
Let me know if you guys are interested. I want to test it in higher parameter models like Llama and Alpaca but don't have the means to finance the equipment.
I also found out that there is a weird setting in the temperature for GPT2 where in the range of .498 to .50 my shutdown code works really well, I still don't know why though. But yeah I believe that there is an incentive to review what's happening inside the transformer architecture.
Here was my original proposal: https://www.whitehatstoic.com/p/research-proposal-leveraging-jungian
I'll post my paper for the corrigibility solution too once finished probably next week but if you wish to contact me, just reply here or email me at migueldeguzmandev@gmail.com.
If you want to see my meeting schedule, You can find it here: https://calendly.com/migueldeguzmandev/60min
Looking forward to hearing from you.
Best regards,
Miguel
Update: Already sent an application, I didn't saw that in my first read. Thank you.
Hello Rob,
I was able to transfer a shutdown protocol to GPT2-medium by allowing it to learn from aligned patterns present in an archetypal dataset consisting of 549 stories that explain the shutdown phrase, called "activate Oath". Archetypal Transfer Learning (ATL) allowed for full value loading in a model like GPT-2-medium and possibly in larger models. Based on my initial experiments using the ATL method, the more capable the system is - the easier it is to implement.