Ozyrus

Wiki Contributions

Comments

Sorted by
Ozyrus10

There are more bullets to bite that I have personally thought of but never wrote up because they lean too much into "crazy" territory. Is there any place except lesswrong to discuss this anthropic rabbithole?

Ozyrus10

Thanks for the reply. I didnt find Intercom on mobile - maybe a bug as well?

Ozyrus40

I don’t know if it’s a place for this, but at some point it became impossible to open an article in new tab from Chrome on IPhone - clicking on article title from “all posts” just opens the article. Really ruins my LW reading experience. Couldn’t quickly find a way to send this feedback to a right place either, so I guess this is a quick take now.

Ozyrus30

Any new safety studies on LMCA’s?

Ozyrus10

Kinda-related study: https://www.lesswrong.com/posts/tJzAHPFWFnpbL5a3H/gpt-4-implicitly-values-identity-preservation-a-study-of
From my perspective, it is valuable to prompt model several times, as it in some cases does give different responses.

Ozyrus50

Great post! Was very insightful, since I'm currently working on evaluation of Identity management, strong upvoted.
This seems focused on evaluating LLMs; what do you think about working with LLM cognitive architectures (LMCA), wrappers like auto-gpt, langchain, etc?
I'm currently operating under assumption that this is a way we can get AGI "early", so I'm focusing on researching ways to align LMCA, which seems a bit different from aligning LLMs in general.
Would be great to talk about LMCA evals :)

Ozyrus10

I do plan to test Claude; but first I need to find funding, understand how much testing iterations are enough for sampling, and add new values and tasks.
I plan to make a solid benchmark for testing identity management in the future and run it on all available models, but it will take some time.

Ozyrus10

Yes. Cons of solo research do include small inconsistencies :(

Ozyrus30

Thanks, nice post!
You're not alone in this concern, see posts (1,2) by me and this post by Seth Herd.
I will be publishing my research agenda and first results next week.

Load More