This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Goal-Directedness
•
Applied to
Think carefully before calling RL policies "agents"
by
TurnTrout
15d
ago
•
Applied to
Creating a self-referential system prompt for GPT-4
by
Ozyrus
1mo
ago
•
Applied to
GPT-4 implicitly values identity preservation: a study of LMCA identity management
by
Ozyrus
1mo
ago
•
Applied to
Investigating Emergent Goal-Like Behavior in Large Language Models using Experimental Economics
by
phelps-sg
1mo
ago
•
Applied to
Capabilities and alignment of LLM cognitive architectures
by
Seth Herd
2mo
ago
•
Applied to
Agentized LLMs will change the alignment landscape
by
Seth Herd
2mo
ago
•
Applied to
Imagine a world where Microsoft employees used Bing
by
Christopher King
3mo
ago
•
Applied to
GPT-4 busted? Clear self-interest when summarizing articles about itself vs when article talks about Claude, LLaMA, or DALL·E 2
by
Christopher King
3mo
ago
•
Applied to
100 Dinners And A Workshop: Information Preservation And Goals
by
Stephen Fowler
3mo
ago
•
Applied to
Does GPT-4 exhibit agency when summarizing articles?
by
Christopher King
3mo
ago
•
Applied to
More experiments in GPT-4 agency: writing memos
by
Christopher King
3mo
ago
•
Applied to
A crazy hypothesis: GPT-4 already is agentic and is trying to take over the world!
by
Christopher King
3mo
ago
•
Applied to
An Appeal to AI Superintelligence: Reasons to Preserve Humanity
by
James_Miller
3mo
ago
•
Applied to
Super-Luigi = Luigi + (Luigi - Waluigi)
by
Alexei
3mo
ago
•
Applied to
The Waluigi Effect (mega-post)
by
Cleo Nardo
3mo
ago
•
Applied to
Evil autocomplete: Existential Risk and Next-Token Predictors
by
Yitz
4mo
ago
•
Applied to
How evolutionary lineages of LLMs can plan their own future and act on these plans
by
Roman Leventov
5mo
ago
•
Applied to
The Alignment Problem from a Deep Learning Perspective (major rewrite)
by
SoerenMind
5mo
ago