Suppose we have an agent A trying to optimise for a reward R in an environment S. 
How can we tell that the presence of the agent does not affect the environment and the measurement(observation) is not only subject to the agent but the environment?

This is related with the measurement problem in quantum computing, we have an agent (a particle ) entangled in a quantum superposition, consider an electron with two possible configurations: up and down,  ,  when we measure the state, it collapses the wavefunction to a particular classical state, up or down.

 Moreover, the observer effect,  notes that measurements of certain systems cannot be made without affecting the system. 
While the uncertainty principle argues that we cannot predict the value of a quantity with arbitrary certainty. 

Another way to state the problem is, does measuring  the state of the action  affect the state of the environment?
How are the observations in physics different from the observations we make in RL?  
Is the environment state at  causal to the state in  ?

New to LessWrong?

New Answer
New Comment

1 Answers sorted by

TLW

Mar 02, 2022

00

Simply asking "does it affect the environment" is not enough here.

Leaking, say, 1pJ of thermal energy to the surroundings in a slightly different manner in the two cases technically affects the surroundings quite significantly - thermal motion in a gas is chaotic after all - but in practice we would tend to call this minimal[1] effects on the environment.

  1. ^

    Assuming a human-livable environment at least. Obviously this might be different if e.g. your environment is all at 2 picoKelvin or somesuch.

1 comment, sorted by Click to highlight new comments since: Today at 1:19 PM

You might also be interested in the question of whether the costs of the agent's own thinking can be included in an RL environment. Suppose I have a finite electricity budget and thinking harder will use electricity. It seems like I as a human am flexible enough to adjust my thinking style to some degree in response to constraints like this, but whap happens yo typical RL agents if they're given negative reward for running out of electricity?