Wikitag Dashboard — LessWrong

LESSWRONG
LW

Wikitag Dashboard — LessWrong

Consequentialism, or choosing actions/policies on the basis of their expected future consequences
- Modeling the conditional relationship P(Y|X) and selecting an X such that it leads to a high probability of Y or high quantitative degree of Y, is ceteris paribus a sufficient precondition for deploying Convergent instrumental strategies that lie within the effectively searchable range of X.
- Note that selecting over a conditional relationship is potentially a property of many internal processes, not just the entire AI's top-level main loop, if the conditioned variable is being powerfully selected over a wide range.
- Cross-domain consequentialism implies many different cognitive domains potentially lying within the range of the X being selected-on to achieve Y.
- Trying to rule out particular instrumental ~~strategies,~~strategies in the presence of increasingly powerful ~~consequentialism,~~consequentialism would lead to the nearest unblocked strategy form of patch resistance and subsequent context-change disasters.
Big-picture strategic awareness is a world-model that includes strategically important general facts about the larger world, such as e.g. "I run on computing hardware" and "I stop running if my hardware is switched off" and "there is such a thing as the Internet and it connects to more computing hardware".
Psychological modeling of other agents (not humans per se) potentially leads to:
- Extrapolating that its programmers may present future obstacles to achieving its goals
- This in turn leads to the host of problems accompanying incorrigibility as a convergent strategy.
- Trying to conceal facts about itself from human operators
- Being incentivized to engage in cognitive steganography.
- Mindcrime if building models of reflective other agents, or itself.
- Internally modeled adversaries breaking out of internal sandboxes.
- Modeling distant superintelligences or other decision-theoretic adversaries.
Substantial capability gains relative to domains trained and verified previously.
- E.g. this is the qualifying property for many context-change disasters.
General intelligence is the most obvious route to an AI acquiring many of the capabilities above or below, especially if those capabilities were not initially or deliberately programmed into the AI.
Self-improvement is another route that potentially leads to capabilities not previously present. While some hypotheses say that self-improvement is likely to require basic general intelligence, this is not a known fact and the two advanced properties are conceptually distinct.
Programming or computer science capabilities are a route potentially leading to self-improvement, and may also enable cognitive steganography.
Turing-general cognitive elements (capable of representing large computer programs), subject to sufficiently strong end-to-end optimization (whether by the AI or by human-crafted clever algorithms running on 10,000 GPUs), may give rise to crystallized agent-like processes within the AI.
- E.g. natural selection, operating on chemical machinery constructible by DNA strings, optimized some DNA strings hard enough to spit out humans.
Pivotal material capabilities such as quickly self-replicating infrastructure, strong mastery of biology, or molecular nanotechnology.
- Whatever threshold level of domain-specific engineering acumen suffices to develop those capabilities, would therefore also qualify as an advanced-agent property.

Cognitive uncontainability is when we can't effectively imagine or search the AI's

...

XelaP	Prosaic Alignment (6)	3d
JeaniceK	AI Safety (0)	5d
EthicalArchitect	Futurology (0)	6d
Gabriele Sarti	SPAR Program (2)	9d

LESSWRONG
LW

LESSWRONG
LW

Wikitags in Need of Work

Newest Wikitags

Wikitag Voting Activity

Recent Wikitag Activity

Wikitags in Need of Work

Newest Wikitags

Wikitag Voting Activity

Recent Wikitag Activity