x

LESSWRONG

LW

Singularian2501 — LessWrong

Singularian2501

Singularian2501

Message

I like reading Machine Learning Paper.

9

2

4y

Singularian2501

I like reading Machine Learning Paper.

Paper: Identifying the Risks of LM Agents with an LM-Emulated Sandbox - University of Toronto 2023 - Benchmark consisting of 36 high-stakes tools and 144 test cases!

Paper: https://arxiv.org/abs/2309.15817 Github: https://github.com/ryoungj/toolemu Website: https://toolemu.com/ Abstract: > Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks - such as leaking private data or causing financial losses. Identifying these risks is labor-intensive,...

Oct 9, 2023•6

RAIN: Your Language Models Can Align Themselves without Finetuning - Microsoft Research 2023 - Reduces the adversarial prompt attack success rate from 94% to 19%!

Paper: https://arxiv.org/abs/2309.07124 Abstract: > Large language models (LLMs) often demonstrate inconsistencies with human preferences. Previous research gathered human preference data and then aligned the pre-trained models using reinforcement learning or instruction tuning, the so-called finetuning step. In contrast, aligning frozen LLMs without any extra data is more appealing. This work...

Sep 24, 2023•5