Hello there! I'm Miguel, the Lead AI Safety Researcher at CirroLytix, an AI and social impact technology company based in the Philippines. We're focused on alignment research and raising awareness about AI safety.

The opinions and projects I share here are my own.


It is very interesting how questions can be promoted or not promoted here in LessWrong by the moderators. 

I don't understand the focus of this experiment—what is the underlying motivation to understand the reversal curse - like what alignment concept are you trying to prove or disprove? is this a capabilities check only?  

Additionally, the supervised, labeled approach used for injecting false information doesn't seem to replicate how these AI systems learn data during training. I see this as a flaw in this experiment. I would trust the results of this experiment if you inject the false information with an unsupervised learning approach to mimic the training environment.

There are some tools that can be used to enhance audio from100% to 200%. via video editing softwares.

This is a thought that frequently crosses my mind too, especially in the field of AI safety. Since no one really has definitive answers yet, my mantra is: keep asking the right questions, continue modeling those questions, and persist in researching.

I've had a similar experience with LTFF; I waited a long time without receiving any feedback. When I inquired, they told me they were overwhelmed and couldn't provide specific feedback. I find the field to be highly competitive. Additionally, there's a concerning trend I've noticed, where there seems to be growing pessimism about newcomers introducing untested theories. My perspective on this is that now is not the time to be passive about exploration. This is a crucial moment in history when we should remain open-minded to any approach that might work, even if the chances are slim.

This captures my perspective well: not everyone is suited to run organizations. I believe that AI safety organizations would benefit from the integration of established "AI safety standards," similar to existing Engineering or Financial Reporting Standards. This would make maintenance easier. However, for the time being, the focus should be on independent researchers pursuing diverse projects to first identify those standards.

I think the Audio could be improved next time.

Hello there! What I meant as components in my comment are like the attention mechanism itself. For reference, here are the mean weights of two models I'm studying.

But empirically, the world doesn’t pursue every technology—it barely pursues any technologies.

While it's true that not all technologies are pursued, history shows that humans have actively sought out transformative technologies, such as fire, swords, longbows, and nuclear energy. The driving factors have often been novelty and utility. I expect the same will happen with AI. 

Hello. I noticed that your proposal for achieving truth in LLMs involves using debate as a method. My concern with this approach is that an AI consists of many small components that aggregate to produce text or outputs. These components simply operate based on what they've learned. Therefore, the idea of clarifying or "deconfusing" these components in the service of truth through debate seems not possible to me. But if have misunderstood the concept, let me know too thanks!

Load More