Wiki Contributions


Why would Toner be related to the CIA, and how is McCauley NSA?

If OpenaI is running out money, and is too dependent on Microsoft, defense/intelligence/government is not the worst place for them to look for money. There are even possible futures where they are partially nationalised in a crisis. Or perhaps they will help with regulatory assessment. This possibility certainly makes the Larry Summers appointment take on a different't light with his ties to not only Microsoft, but also the Government.

OpenAI sometimes get there faster. But I think other players will catch up soon, if it's a simple application of RL to LLM's.

Ah, now it makes sense. I was wondering how world model interpretability leads to alignment rather than control. After all, I don't think you will get far controlling something smarter than you against its will. But alignment of value could scale with large gaps in intelligence.

When that 2nd phase, there are a few things you can do. E.g the 2nd phase reward function could include world model concepts like "virtue", or you could modify the world model before training.

since in practice it seems like models don't really exist to people unless there's a convenient way to inference with them in their browser or on their phone.

I think it's more of an interest vs effort. For example, I went through Colin Burn's CSS.ipynb because the interest was high enough to justify the small overhead in getting it running

The way I read it, her teacher was trying to tell her about words, but she didn't make the connection between the words and mental objects (she thought it was spelling, not naming). Once she did, they became much more interesting.

It's cool that this works (at least a bit)! It reminds me of the world models in RL agents. As these have an encoder, decoder, and latent space predictor (conditional on action). I wonder how long it will be before someone uses LLM's an explicit world model in an agent.

Given the general power of pretrained LLM's, it may help with the data efficiency of RL agents (ignoring the LLM pretraining).

Making an agent won't help with alignment, but having a world model (and its associated state) to inspect might.

In contrast, the last two techniques listed rely on current AI models being very powerful and quite steerable.

An alternative view is that we've been lucky. LLMs are trained by unsupervised learning and are almost oracles that are moderately aligned by default.

But soon someone will make them into Reinforcement Learning (RL) agents that can plan. They will do this because long-term planning is super useful and RL is the best way we have to do it. However, RL tends to make power-seeking agents that look for shortcuts and exploits ( most mid-specification examples are from RL ).

So I worry that we will see more unsafe examples soon.

Please let me know if that description is wrong!

Not the author, but that's my reading of it too.

Load More