Gwern, i wonder what you think about this question i asked a while ago on causality in relation to the article you posted on reddit. Do we need more general causal agents for addressing issues in RL environments?
Apologies for posting here, i didn't know how to mention/tag someone on a post in LW.
Is bias simply human in the loop problem(is it something that can be solved by data refinement and having diverse programmers), or is it also related to explainability of AI, the fact that we can not explain why AI decided to make some decisions. A simple example would be if an AGI was supposed to identify extreme ideology in a persons posts on social media: one AI (honest) tells us an extreme person A is extreme, while the other AI (dishonest) tells us an extreme person B is not extreme (even thou it knows the person is extreme). In the above scenario, having a human trying to understand if there is bias would be futile, since the untruthful AI would basically perpetuate bias by lying about there being no bias. Does this mean algorithmic bias is beyond human in the loop, but also an architectural bias (if we had more causal models and logic in neural networks then we could have less of such bias and side effects).
Very curious about how this relates to Human Aligned RLHF https://twitter.com/ramealexandre/status/1666758670204502016