Wikitag Dashboard — LessWrong

LESSWRONG
LW

Wikitag Dashboard — LessWrong

Value identification
- Edge instantiation
- Unforeseen maximums
- Ontology identification
  - Cartesian boundary
  - Human identification
- Inductive value learning
  - Ambiguity-querying
  - Moral uncertainty
    - Indifference
Patch resistance
- Nearest Unblocked ~~Neighbor~~Strategy
Corrigibility
- Anapartistic reasoning
  - Programmer deception
  - Early conservatism
  - Reasoning under confusion
- User maximization / Unshielded argmax
  - Hypothetical user maximization
Genie theory
- Limited AI
  - Weak optimization
    - Safe optimization measure (such that we are confident it has no Edge that secretly optimizes more)
      - Factoring of an agent by stage/component optimization power
    - 'Checker' smarter than 'inventor / chooser'
      - 'Checker' can model humans, 'strategizer' cannot
  - Transparency
  - Domain restriction
    - Behaviorism
  - Effable optimization (opposite of cognitive uncontainability; uses only comprehensible strategies)
    - Minimal concepts (simple, not simplest, that contains fewest whitelisted strategies)
- Genie preferences
  - Low-impact AGI
    - Minimum Safe AA (just flip off switch and shut down safely)
    - Safe impact measure
    - Armstrong-style permitted output channels
    - Shutdown utility function
  - Oracle utility function
    - Safe indifference?
  - Online checkability
    - Reporting without programmer maximization
  - Do What I Know I Mean
Superintelligent security (all subproblems placing us in adversarial context vs. other SIs)
- Bargaining
  - Non-blackmailability
  - Secure counterfactual reasoning
  - First-mover penalty / epistemic low ground advantage
  - Division of gains from trade
- Epistemic exclusion of distant SIs
  - Distant superintelligences can coerce the most probable environment of your AI
  - Breaking out of hypotheses
'Philosophical' problems
- One True Prior
  - Pascal's Mugging / leverage prior
  - Second-orderness
  - Anthropics
    - How would an AI decide what to think about QTI?
- Mindcrime
  - Nonperson predicates (and unblocked neighbor problem)
- Do What I Don't Know I Mean
  - CEV
- Philosophical competence
  - Unprecedented excursions

Misleading Encouragement / context change / treacherous designs for naive projects
- Programmer prediction & infrahuman domains hide complexity of value
- Context change problems
- Problems that only appear in advanced regimes
- Problem classes that seem debugged in infrahuman regimes and suddenly break again in advanced regimes
- Methodologies that only work in infrahuman regimes
- Programmer deception
Academic inadequacy
- 'Ethics' work neglects technical problems that need longest serial research times and fails to give priority to astronomical failures over survivable small hits, but 'ethics' work has higher prestige, higher publishability, and higher cognitive accessibility
- Understanding of big technical picture currently very rare
  - Most possible funding sources cannot predict for themselves what might be technically useful in 10 years
  - Many possible funding sources may not regard MIRI as trusted to discern this
- Noise problems
  - Ethics research drowns out technical research
    - And provokes counterreaction
    - And makes the field seem nontechnical
  - Naive technical research drowns out sophisticated technical research
    - And makes problems look more solvable than they really are
    - And makes tech problems look trivial, therefore nonprestigious
    - And distracts talent/funding from hard problems
  - Bad methodology louder than good methodology
    - So projects can appear safety-concerned while adopting bad methodologies
- Future adequacy counterfactuals seem distant from the present regime
(To classify)
- Coordinative AI development hypothetical

Jonas Hallgren	Collective Intelligence (0)	9h
Joe Kwon	Secret Loyalties (1)	1d
Joschka Braun	Exploration Hacking (2)	2d
Ruby	LessOnline (11)	4d

LESSWRONG
LW

LESSWRONG
LW

Wikitags in Need of Work

Newest Wikitags

Wikitag Voting Activity

Recent Wikitag Activity

Wikitags in Need of Work

Newest Wikitags

Wikitag Voting Activity

Recent Wikitag Activity