Siebe

Former community director EA Netherlands. Now disabled by long covid, ME/CFS. Worried about AGI & US democracy 

Wikitag Contributions

Comments

Sorted by
Siebe165

This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?

The way I imagine this to work is basically that an AI agent would develop really strong intuitions that "that's just what ASIs do". It might prevent it from properly modelling other agents that aren't trained on this, but it's not obvious to me that that's going to happen or that it's such a decisively bad thing to outweigh the positives

Siebe2-15

One way to operationalize "160 years of human time" is "thing that can be achieved by a 160-person organisation in 1 year", which seems like it would make sense?

Siebe70

This makes me wonder if it's possible that "evil personas" can be entirely eliminated from distilled models, by including positive/aligned intent labels/traces throughout the whole distillation dataset

Siebe11

Seems to me the name AI safety is currently still widely used, no? As it covers much more than just alignment strategies, by including also stuff like control and governance

Siebe1-4

The AI Doomers are only one of several factions that oppose AI and seek to cripple it via weaponized regulation.

Bad faith

There are also factions concerned about “misinformation” and “algorithmic bias,” which in practice means they think chatbots must be censored to prevent them from saying anything politically inconvenient.

Bad faith

AI Doomer coalition abandoned the name “AI safety” and rebranded itself to “AI alignment.”

Seems wrong

Siebe80

What about whistle-blowing and anonymous leaking? Seems like it would go well together with concrete evidence of risk.

Siebe54

This is very interesting, and I had a recent thought that's very similar:

This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?

The way I imagine this to work is basically that an AI agent would develop really strong intuitions that "that's just what ASIs do". It might prevent it from properly modelling other agents that aren't trained on this, but it's not obvious to me that that's going to happen or that it's such a decisively bad thing to outweigh the positives

I imagine that the ratio of descriptions of desirable vs. descriptions of undesirable behavior would matter, and perhaps an ideal approach would both (massively) increase the amount of descriptions of desirable behavior as well as filter out the descriptions of unwanted behavior?

Siebe30

I think it might make sense to do it as a research project first? Though you would need to be able to train a model from scratch

Siebe4224

I think you should publicly commit to:

  • full transparency about any funding from for profit organisations, including nonprofit organizations affiliated with for profit
  • no access to the benchmarks to any company
  • no NDAs around this stuff

If you currently have any of these with the computer use benchmark in development, you should seriously try to get out of those contractual obligations if there are any.

Ideally, you commit to these in a legally binding way, which would make it non-negotiable in any negotiation, and make you more credible to outsiders.

Load More