It's a shame language model decoding isn't deterministic, or I could make a snarky but unhelpful comment that the information content is provably identical, by some sort of pigeon hole argument.
Epistemic status: 11 pages in to “The lathe of heaven” and dismayed by Orr
Are alignment methods that rely on the core intelligence being pre-trained on webtext sufficient to prevent ASI catastrophe?
What are the odds that, 40 years after the first AGI, the smartest intelligence is pretrained on webtext?
What are the odds that the best possible way to build an intelligent reasoning core is to pretrain on webtext?
What are the odds that we can stay in a local maximum for 40 years of everyone striving to create the smartest thing they can?
My mental model of the sequelae of AGI in ~10 years without an intentional global slowdown is that within my natural lifespan, there will be 4-40 transistions in the architecture of the current smartest intelligence, where the architecture undergoes changes in overall approach at least as large as the difference from evolution -> human brain or human brain -> RL'd language model. Alignment means building programs that themselves are benevolent, but are also both wise and mentally tough enough to only build benevolent and wise successors, even when put under crazy pressure to build carelessly. When I say crazy pressure, I mean "the entity trying to get you to build carelessly is dumber than you, but it gets to RL you into agreeing to help" levels of pressure. This is hard.
A successfully trained 1 hidden layer perceptron with 500 hidden activations has at absolute minimum 500! possible successful parameter settings.
Thanks for sharing this. I think I need to be more appreciative that my university experience may have been good through perhaps exceptional efforts on the part of the university, and not as some default. In particular, this can be true even as the best parts centered around the university getting out of the way of the students.
I’m not entirely sure, actually. My main serious encounter with postmodernism was trying to engage with toni morrison’s beloved well enough to not fail a college class, which jumps out as a possible explanation. I’m willing to buy that its not as central as I thought.
I wonder if we need someone to distill and ossify postmodernism into a form that rationalists can process if we are going to tackle the problems postmodernism is meant to solve. A blueprint would be the way that FDT plus prisoners dilemma ossifies sartre’s existentialism is a humanism, at some terrible cost to nuance and beauty, but the core is there.
My suspicion of what happened at a really high level is that fundamentally one of the driving challenges of postmodernism is to actually understand rape, in the sense that rationalism is supposed to respect: being able to predict outcomes, making the map fit the territory etc. EY is sufficiently naive of postmodernism that the depictions of rape and rape threats in Three Worlds Collide and HPMOR basically filtered out anyone with a basic grasp of postmodernism from the community. There’s an analagous phenomenon where when postmodernist writers depicts quantum physics, they do a bad enough job that it puts off people with a basic grasp of physics from participating in postmodernism. Its epistemically nasty too: this comment is frankly low quality, but if I understood postmodernism well enough to be confident in this comment I suspect I would have been sufficiently put off by the draco-threatens-to-rape-luna subplot in HPMOR to have never actually engaged with rationalism.
Yeah, if you are doing e.g. a lab heavy premed chemistry degree my advice may not apply to an aspiring alignment researcher. This is absolutely me moving the goalposts, but may also be true: on the other hand, if you are picking courses with purpose, in philosophy, physics, math, probability, comp sci: theres decent odds imho that they are good uses of time in proportion to the extent that they are actually demanding your time.
For undergrad students in particular, the current university system coddles. The upshot is that if someone is paying for your school and would not otherwise be paying tens of thousands of dollars a year to fund an ai safety researcher, successfully graduating is sufficiently easy that its something you should probably do while you tackle the real problems, in the same vein as continuing to brush your teeth and file taxes. Plus you get access to university compute and maybe even advice from professors.
No law of physics stops the first AI in an RSI cascade from having its values completely destroyed by RSI. I think this is the default outcome?