Misalignment Harms Can Be Caused by Low Intelligence Systems

DialecticEel

Most clinical trials are very simple statistical models. There are not trillions of parameters in a complex neural network architecture. Yet they are still the most effective tools we have at validating and optimising health and recovery in human beings. Very simple statistical tools can lead to large effects and, if misaligned, very large harms.

When private companies use A/B tests on human beings, they are usually explicitly optimising for objectives which are not aligned with human wellbeing - engagement, click-through rate, company profits. The harm that this can then cause depends on the nature of the interaction surface between the technology and the human, and the amount of exposure.

In configuration space we can consider the total effect of a company on a human as where $\frac{\partial H}{\partial C} |_{t}$ is the 'interaction bandwidth' between human and company across the human's markov blanket at time t. As an approximation, we can say the magnitude of the overall interaction is going to be roughly proportional to the magnitude of the average interaction bandwidth multiplied by the cumulative interaction time:

$| Δ H |_{C} \approx ⟨ \frac{\partial H}{\partial C} ⟩ \cdot T$

In modern social media, $T$ is obviously high relative to many other interventions, but what about $⟨ \frac{\partial H}{\partial C} ⟩$ ? What makes the interaction bandwidth high or low? Is this determined by the the complexity of the recommendation algorithm you use? Is this determined by the potency of the content available to recommend?

If there was a piece of content so convincing that anyone who read it instantly decided to sell all of their possessions, give the proceeds to MegaCorp and spend the rest of their life giving any available money to MegaCorp, then all the MegaCorp Recommendation Algorithm™ would have to do to maximise profits would be to show this content. It would not require a complex statistical model. Now this content may not exist, but participants in the platform are strongly incentivised to try and create it as conforming with the utility of the platform leads to broad reach again using a simple recommendation system.

Now, once you do have a highly intelligent recommendation system optimising on humans with knowledge of the human's present and past states, the interaction between recommendation system and human is no longer ergodic and there can be more powerful interactions constructed over time with a tight feedback loop on the human. My point is, that this is not necessary for severe effects and severe harms to be caused by misaligned systems.

agreed. the worst adversarial examples discoverable by coordination of mere humans are pretty severe; partly because, well, it's not that hard to come up with an adversarial example for a heavily correlational thinker like a human. causal learning is difficult; I'm pretty confident I'm smarter than the youtube recommender, and I've been able to steer it fairly well to give me useful content, I encourage others to put the effort in to only watch stuff that is worth your time because it'll take note. but the majority of videos on youtube are created by a very unhealthy feedback loop that coordinates human intelligence into a system naturally misaligned with human flourishing, agreed.

but! don't underestimate how much worse this would be if it were dramatically smarter than the viewer. can you imagine how bad it would be if the recommender was able to think of everything you might want to do next, give you subtly corrupted versions of it that promote a particular kind of blind spot that resonates with your fear emotions but doesn't tell you any way to solve the problem besides to come back for more later, and then give you entertainment that calms you down just right to recover enough for the next hit? I mean, this already happens, but what if it happened to even more resilient people :P

Good exploration, but I'm not sure I agree with the thesis. Often there's a ton of complexity in the search and refinement of a simple result. You can't start with the simple model and assume away the creation of that model (or if you prefer, the identification of that model in model-space).

Most clinical trials are very simple statistical models. There are not trillions of parameters in a complex neural network architecture. Yet they are still the most effective tools we have at validating and optimising health and recovery in human beings.

Simply wrong. Clinical trials are required for final validation and regulatory compliance, but they're not effective at all for optimizing health or recovery in human beings. They are a tiny part of the search and development of an intervention. There are multiple complex neural network architectures (many inside peoples' heads, and many on fairly complicated data-processing systems) required to get to the point of even thinking about the clinical trials.

If there was a piece of content so convincing ... would not require a complex statistical model.

Also incorrect. It would almost certainly require a complex model to find/create that content, possibly anew for each susceptible human.

They are a tiny part of the search and development of an intervention.

I agree that there is complexity in healthcare that is not explained by a simple statistical model, my point is that the final layer often is a simple statistical model that drives a lot of the complexity and the outcomes. Making a drug is much more complex than deciding which drug to give, but that decision ultimately drives the outcomes.

Also incorrect. It would almost certainly require a complex model to find/create that content, possibly anew for each susceptible human.

Same point as above. If there were a piece of content that worked for all humans then a simple model would suffice (multi-armed bandit for example) and if the content doesn't exist, people are incentivised to create it

Most clinical trials are very simple statistical models. There are not trillions of parameters in a complex neural network architecture. Yet they are still the most effective tools we have at validating and optimising health and recovery in human beings.

If there was a piece of content so convincing ... would not require a complex statistical model.

Also incorrect. It would almost certainly require a complex model to find/create that content, possibly anew for each susceptible human.

They are a tiny part of the search and development of an intervention.

Also incorrect. It would almost certainly require a complex model to find/create that content, possibly anew for each susceptible human.

LESSWRONG
LW

LESSWRONG
LW

11

Misalignment Harms Can Be Caused by Low Intelligence Systems

11

11

11