Impact measuresregularizers penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.
How do you rigorously define "low impact" in a way that a computer can understand – how do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard; we don't want AI systems to rampantly disrupt their environment. In the limit of goal-directed intelligence, theorems suggest that seeking power tends to be optimal; we don't want highly capable AI systems to permanently wrench control of the future from us.
Currently, impact measurementregularization research focuses on two approaches:
Sequences on impact measurement:regularization:
Impact measures penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.
How do you rigorously define "low impact" in a way that a computer can understand – how do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard; we don't want AI systems to rampantly disrupt their environment. In the limit of goal-directed intelligence, theorems suggest that seeking power tends to be optimal; we don't want highly capable AI systems to permanently wrench control of the future from us.
To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.
How do you rigorously define low impact"low impact" in a way that a computer can understand? How do you measure impact?impact? These questions are important for both prosaic and future AI systems: objective specification is hard, and we don't want AI systems to rampantly disrupt their environment. Furthermore, we don't want highly capable AI systems to permanently wrench control of the future from us.
Impact measures penalize an AI for affecting us too much.To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.
How do you rigorously define "low impact" in a way that a computer can understand? How do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard, and we don't want AI systems to rampantly disrupt their environment. Furthermore, we don't want highly capable AI systems to permanently wrench control of the future from us.
Impact measures penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.
How do you rigorously define "low impact" in a way that a computer can understand? How do you measure impact?impact? These questions are important for both prosaic and future AI systems: objective specification is hard, and we don't want AI systems to rampantly disrupt their environment. Furthermore, we don't want highly capable AI systems to permanently wrench control of the future from us.
Impact regularizersRegularizers penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.
How do you rigorously define "low impact" in a way that a computer can understand – how do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard; we don't want AI systems to rampantly disrupt their environment. In the limit of goal-directed intelligence, theorems suggest that seeking power tends to be optimal; we don't want highly capable AI systems to permanently wrench control of the future from us.
Impact measures penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.
How do you rigorously define "low impact" in a way that a computer can understand? Howunderstand – how do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard, and; we don't want AI systems to rampantly disrupt their environment. Furthermore,In the limit of goal-directed intelligence, seeking power tends to be optimal; we don't want highly capable AI systems to permanently wrench control of the future from us.
To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.
How do you rigorously define "low impact"low impact in a way that a computer can understand? How do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard, and we don't want AI systems to rampantly disrupt their environment. Furthermore, we don't want highly capable AI systems to permanently wrench control of the future from us.