Wiki Contributions



Any new safety studies on LMCA’s?


Kinda-related study: https://www.lesswrong.com/posts/tJzAHPFWFnpbL5a3H/gpt-4-implicitly-values-identity-preservation-a-study-of
From my perspective, it is valuable to prompt model several times, as it in some cases does give different responses.


Great post! Was very insightful, since I'm currently working on evaluation of Identity management, strong upvoted.
This seems focused on evaluating LLMs; what do you think about working with LLM cognitive architectures (LMCA), wrappers like auto-gpt, langchain, etc?
I'm currently operating under assumption that this is a way we can get AGI "early", so I'm focusing on researching ways to align LMCA, which seems a bit different from aligning LLMs in general.
Would be great to talk about LMCA evals :)


I do plan to test Claude; but first I need to find funding, understand how much testing iterations are enough for sampling, and add new values and tasks.
I plan to make a solid benchmark for testing identity management in the future and run it on all available models, but it will take some time.


Yes. Cons of solo research do include small inconsistencies :(


Thanks, nice post!
You're not alone in this concern, see posts (1,2) by me and this post by Seth Herd.
I will be publishing my research agenda and first results next week.


Nice post, thanks!
Are you planning or currently doing any relevant research? 


Very interesting. Might need to read it few more times to get it in detail, but seems quite promising.

I do wonder, though; do we really need a sims/MFS-like simulation?

It seems right now that LLM wrapped in a LMCA is how early AGI will look like. That probably means that they will "see" the world via text descriptions fed into them by their sensory tools, and act using action tools via text queries (also described here). 

Seems quite logical to me that this very paradigm in dualistic in nature. If LLM can act in real world using LMCA, then it can model the world using some different architecture, right? Otherwise it will not be able to act properly. 

Then why not test LMCA agent using its underlying LLM + some world modeling architecture? Or a different, fine-tuned LLM.



Very nice post, thank you!
I think that it's possible to achieve with the current LLM paradigm, although it does require more (probably much more) effort on aligning the thing that will possibly get to being superhuman first, which is an LLM wrapped in in some cognitive architecture (also see this post).
That means that LLM must be implicitly trained in an aligned way, and the LMCA must be explicitly designed in such a way as to allow for reflection and robust value preservation, even if LMCA is able to edit explicitly stated goals (I described it in a bit more detail in this post).

Load More