I agree, though if we're defining rationality as a preference for better methods, I think we ought to further disambiguate between "a decision theory that will dissolve apparent conflicts between what we currently want our future selves to do and what those future selves actually want to do" and "practical strategies for aligning our future incentives with our current ones"
Suppose someone tells you that they'll offer you $100 tomorrow and $10,000 today if you make a good-faith effort to prevent yourself from accepting the $100 tomorrow. The best outcome would be to make a genuine attempt to disincentivize yourself from accepting the money tomorrow, but fail and accept the money anyway- however,... (read more)
I wonder if some of the conflation between belief-as-prediction and belief-as-investment is actually a functional social technology for solving coordination problems. To avoid multi-polar traps, people need to trust eachother to act against individual incentives- to rationally pre-commit to acting irrationally in the future. Just telling people "I'm planning act against my incentives, even though I know that doing so will be irrational at the time" might not be very convincing, but instead claiming to have irrationally certain beliefs that would change your incentives were that certainty warranted can be more convincing. Even if people strongly suspect that you're exaggerating, they know that the social pressure to avoid a loss of status... (read more)
I'm certain your model of what purpose is is a lot more detailed than mine. My take, however, is that animal brains don't exactly have a utility function, but probably do have something functionally similar to a reward function in machine learning. A well-defined set of instrumental goals terminating in terminal goals would be a very effective way of maximizing that reward, so the behaviors reinforced will often converge on an approximation of that structure. However, the biological learning algorithm is very bad at consistently finding the structure, and so the approximations will tend to shift around and conflict a lot- behaviors that approximate a terminal goal one year might approximate an... (read more)
Definitions and justifications have to be circular at some point, or else terminate in some unexplained things, or else create an infinite chain.
If I'm understanding your point correctly, I think I disagree completely. A chain of instrumental goals terminates in a terminal goal, which is a very different kind of thing from an instrumental goal in that assigning properties like "unjustified" or "useless" to it is a category error. Instrumental goals either promote higher goals or are unjustified, but that's not true of all goals- it's just something particular to that one type of goal.
I'd also argue that a chain of definitions terminates in qualia- things like sense data and instincts determine... (read more)
An important bit of context that often gets missed when discussing this question is that actual trans athletes competing in women's sports are very rare. Of the millions competing in organized sports in the US, the total number who are trans might be under 20 (see this statement from the NCAA president estimating "fewer than ten" in college sports, this article reporting that an anti-trans activist group was able to identify only five in K-12 sports, and this Wikipedia article, which identifies only a handful of trans athletes in professional US sports).
Because this phenomenon is so rare relative to how often it's discussed, I'm a lot more interested in the sociology of... (read more)
So, in practice, what might that look like?
Of course, AI labs use quite a bit of AI in their capabilities research already- writing code, helping with hardware design, doing evaluations and RLAIF; even distillation and training itself could sort of be thought of as a kind of self-improvement. So, would the red line need to target just fully autonomous self-improvement? But just having a human in the loop to rubber-stamp AI decisions might not actually slow down an intelligence explosion by all that much, especially at very aggressive labs. So, would we need some kind of measure for how autonomous the capabilities research at a lab is, and then draw the line... (read more)
The most important red line would have to be strong superintelligence, don't you think? I mean, if we have systems that are agentic in the way humans are, but surpass us in capabilities in the way we surpass animals, it seems like specific bans on the use of weapons, self-replication, and so on might not be very effective at keeping them in check.
Was it necessary to avoid mentioning ASI in the "concrete examples" section of the website to get these signatories on board? Are you concerned that avoiding that subject might contribute to the sense that discussion of ASI is non-serious or outside of the Overton window?
I think this is related to what Chalmers calls the "meta problem of consciousness"- the problem of why it seems subjectively undeniable that a hard problem of consciousness exists, even though it only seems possible to objectively describe "easy problems" like the question of whether a system has an internal representation of itself. Illusionism- the idea that the hard problem is illusory- is an answer to that problem, but I don't think it fully explains things.
Consider the question "why am I me, rather than someone else". Objectively, the question is meaningless- it's a tautology like "why is Paris Paris". Subjectively, however, it makes sense, because your identity in objective reality and your... (read more)
I used to do graphic design professionally, and I definitely agree the cover needs some work.
I put together a few quick concepts, just to explore some possible alternate directions they could take it:
https://i.imgur.com/zhnVELh.png
https://i.imgur.com/OqouN9V.png
https://i.imgur.com/Shyezh1.png
These aren't really finished quality either, but the authors should feel free to borrow and expand on any ideas they like if they decide to do a redesign.
I wonder: what odds would people here put on the US becoming a somewhat unsafe place to live even for citizens in the next couple of years due to politics? That is, what combined odds should we put on things like significant erosion of rights and legal protections for outspoken liberal or LGBT people, violent instability escalating to an unprecedented degree, the government launching the kind of war that endangers the homeland, etc.?
My gut says it's now at least 5%, which seems easily high enough to start putting together an emigration plan. Is that alarmist?
More generally, what would be an appropriate smoke alarm for this sort of thing?
So, I noticed something a bit odd about the behavior of LLMs just now that I wonder if anyone here can shed some light on:
It's generally accepted that LLMs don't really "care about" predicting the next token- the reward function being something that just reinforces certain behaviors, with real terminal goals being something you'd need a new architecture or something to produce. While that makes sense, it occurs to me that humans do seem to sort of value our equivalent of a reward function, in addition to our more high-level terminal goals. So, I figured I'd try and test whether LLMs are really just outputting a world model + RLHF, or if... (read 398 more words →)
In the spirit of posting more on-the-ground impressions of capability: in my fairly simple front-end coding job, I've gone in the past year from writing maybe 50% of my code with AI to maybe 90%.
My job the past couple of months has been this: attending meetings to work out project requirements, breaking those requirements into a more specific sequence of tasks for the AI- often just three or four prompts with a couple of paragraphs of explanation each- then running through those in Cursor, reviewing the changes and making usually pretty minor edits, then testing- which almost never reveals errors introduced by the AI itself in recent weeks- and finally pushing out... (read more)