I am not an expert in starting new journals, but I think that one is certainly needed. And it needs to be mainstream, which means in particular: listed in Clarivate SCIE/SSCI, Scopus, etc. It should apply for an official IF and so on.
Instead of 5% here 5% there we should consider a baseline of how much societal effort goes into maintaining cemeteries/necropolises. This differs from society to society, there are choices to be made here, but it's hard to imagine a civilization without such.
I think that's a good way of phrasing it, except that I would emphasize that these are two different states of knowledge, not necessarily two different states of the world.
I didn't think it would work out to the maximum entropy distribution even in your first case, so I worked out an example to check:
Suppose we have a three-sided die, that can land on 0, 1 or 2. Then suppose we are told the die was rolled several times, and the average value was 1.5. The maximum entropy distribution is (if my math is correct) probability 0.116 for 0, 0.268 for 1 and 0.616 ...
Epoch AI has a map of frontier AI datacenters: https://epoch.ai/data/data-centers/satellite-explorer
The things you're saying may be true, but I'm not sure the Slytherin necklace is a super good example. I feel like she put on the necklace that morning and had a moment where she thought "haha this is Slytherin-coded," and she wanted to share that feeling with you in a playful way. I doubt she was thinking "when I wear this necklace, I predict that people will associate me with Slytherin. I shall now test this hypothesis by asking John."
My very uninformed model of this girl says that if she read this post, she'd kind of roll her eyes and say "lol it really wasn't that deep." But only she could say for sure.
Got it. Okay thanks!
Yep, e.g. donations sooner are better for getting endorsements. Especially for Bores and somewhat for Wiener, I think.
you could plausibly do this, and it would certainly reduce maintenance load a lot. every few years you will need to retire the old gpus and replace then with newer generation ones, and that often breaks things or makes them horribly inefficient. also, you might occasionally have to change the container to patch critical security vulnerabilities.
Good point.
How it started: pics or it didn't happen.
How it's going: IRL or it didn't happen.
I think there is a window of opportunity for humans to create a reputation for legitimacy and a venue for official information. Consider Neil Degrassi Tyson and the recent flat earth fake. People know where to check to see if he really changed his mind. He has a valid place for things to appear, and a reputation.
Then consider a purported leaked recording of a politician. There's no way to validate or invalidate it. It is a leak, so you expect the politician to deny it whether ...
IIUC there are two scenarios to be distinguished:
One is that the die has bias p unknown to you (you have some prior over p) and you use i.i.d flips to estimate bias as usual & get maxent distribution for a new draw. The draws are independent given p but not independent given your priors, so everything works out.
The other is that the die is literally i.i.d over your priors. In this case everything from your argument routes through: Whatever bias\constraint you happen to estimate from your outcome sequence doesn't say anything about a new i.i.d draw because they're uncorrelated, the new draw is just another sample from your prior
I know many people whose lives were radically changed by The Lord of the Rings, The Narnia Chronicles, Star Wars, or Ender's Game.
The first three spawned a vast juvenile fantasy genre which convinces people that they're in a war between pure good and pure evil, in which the moral thing to do is always blindingly obvious. (Star Wars at least had a redemption arc, and didn't divide good and evil along racial lines. In LotR and Narnia, as in Marxism and Nazism, the only possible solution is to kill or expel every member of the evil races/classes.) ...
On a quick glance it looks like the intention is (partially) to promote a memecoin: https://www.ai-2028.com/today/coin
My suggestion would be to allow them to go on ArXiv regardless, except you flag them as not discoverable (so you can find them with the direct link only) and with a clear visual icon? But you still let people do it. Otherwise, yeah, you’re going to get a new version of ArXiv to get around this.
We already have viXra, with its own "can of worms" to say the least, https://en.wikipedia.org/wiki/ViXra.
And if I currently go to https://vixra.org/, I see that they do have the same problem, and this is how they are dealing with it:
...Notice: viXra.org only accept
Thanks! And thanks for reading!
I talk some about MIRI's 2015 misstep here (and some here). In short, it is hard to correctly balance arbitrary top-level goals against an antinatural goal like shutdownability or corrigibility, and trying to stitch corrigibility out of sub-pieces like shutdownability is like trying to build an animal by separately growing organs and stitching them together -- the organs will simply die, because they're not part of a whole animal. The "Hard Problem" is the glue that allows the desiderata to hold together.
I discuss a range of ...
Every once in a while I think about Robert Freitas' 1984 essay Xenopsychology, in particular his Sentience Quotient (SQ) idea:
...It is possible to devise a sliding scale of cosmic sentience universally applicable to any intelligent entity in the cosmos, based on a "figure of merit" which I call the Sentience Quotient. The essential characteristic of all intelligent systems is that they process information using a processor or "brain" made of matter-energy. Generally the more information a brain can process in a shorter length of time, the more intellige
Just came across this old philosophy class paper of mine, basically arguing against eliminativism in philosophy of mind: https://docs.google.com/document/d/1FLGF4bKj0blFyn8JPeXa73DBhigNKX3Wecujcv4AOjQ/edit?usp=sharing
I still stand by it I think. Curious if anyone has thoughts. Feel free to leave comments in the doc.
Sounds like an excellent idea. The Journal of Existential Risk of AI.
Someone please explain
His actual top objection is that even if we do manage to get a controlled and compliant ASI, that is still extremely destabilizing at best and fatal at worst.
Michael Nielsen brings forth a very valid concern, which should have made a lot of Alignment researchers update their beliefs already.
We currently don't know what a benevolent OR compliant ASI would look like, or how it may end up affecting humanity (and our future agency). Worse, I doubt we can distinguish success from failure.
Richard Rorty argued that stories, rather than ethical principles, are at the heart of morality. For Rorty, the basic question of morality is which groups to recognize as persons entitled to respect. Stories about women and slaves made privileged people recognize them as people who matter.
Within Rorty's framing, it feels like The Wild Robot, Wall-E, and stories like that prime us to (eventually) recognize the personhood of robots. I suppose those would be important stories if we succeeded in creating conscious entities that desire to continue living*, but ...
absolutism, treating their conclusions and the righteousness of their cause as obvious, and assuming it should override ordinary business considerations.
It doesn't take certainty in any position to criticize driving at half-speed.
My guess would be that OpenAI and Anthropic both lowball their financial estimates for strategic reasons. Better for your already-very-ambitious targets to be exceeded repeatedly, than to propose even one so-ambitious-you-sound-like-an-insane-cult target which you then fail to meet.
Some subtle signals perhaps?
Earnest question: For both this & donating to Alex Bores, does it matter whether someone donates sooner rather than a couple months from now? For practical reasons, it will be easier for me to donate in 2026--but if it will have a substantially bigger impact now, then I want to do it sooner.
Sure, but if we put a third "if" on top (namely, "it's a representation of our credences, but also both hypotheses are nosy neighbors"), doesn't that undo the second "if" and bring us back to the first?
I agree. I am a psychologist (cognitive not clinical) by training, who reads technical articles, and I see those parallels constantly.
This put me in mind of writing a short post titled something like "alignment includes psychology, whether we like it or not". My previous short form on psychology and alignment was my most downvoted ever. I think it's a repulsive concept to the types of people who work on alignment, for bad reasons and good. I think there are good reasons for being horrified if alignment requires a psychological approach. Psychology knows ve...
Is this equally true of GPT5 and Sonnet 4.5? They're the first models trained with reducing sycophancy as one objective.
I agree in general.
I could do better by imagining that I will have infinitely many independent rolls, and then updating on that average being exactly 2.0 (in the limit). IIUC that should replicate the max relative entropy result (and might be a better way to argue for the max relative entropy method), but I have not checked that myself.
I had thought about something like that, but I'm not sure it actually works. My reasoning (which I expect might be close to yours, since I learned about this theorem in a post of yours) was that by the entropy concentration theorem, most outco...
Thanks :)
I will reviel the true answer to 2 in about a week, in case anyone else want to take a guess.
I mistakenly believed this was common knowledge by now. Sam Altmans history goes way back.
I recommend reading 'Empire of AI' by journalist Karen Hao,
for extensive breakdown of all controversies relating to Altman, OpenAI and the US AI boom last few years. If anyone reads it (recommended) you might not agree on all the analysis, especially re. whom to blame for what, but it is factual.
Wait, do you think value uncertainty is equivalent/reducible to uncertainty about the correct prior?
Yep. Value uncertainty is reduced to uncertainty about the correct prior via the device of putting the correct values into the world as propositions.
Would that mean the correct prior to use depends on your values?
If we construe "values" as preferences, this is already clear in standard decision theory; preferences depend on both probabilities and utilities. UDT further blurs the line, because in the context of UDT, probabilities feel more like a "carin...
In the example in the post, what would you say is the "prior distribution over sequences of results"?
I don't actually know.
If it's a binary experiment, like a "biased coin" that outputs either Heads or Tails, an appropriate distribution is Laplace's Rule of Succession (like I mentioned). Laplace's Rule has a parameter that is the "objective probability" of Heads, in the sense that if we know our probabilities for each result giving Heads is independently. (I don't think it makes sense to think of as an actual...
Good questions, those are exactly the sorts of things which confused me when learning this stuff! And sometimes still do confuse me.
Even if you don't know anything other than the average value, you can still take your distribution over sequences of results, update it on this information (eliminating the possible outcome sequences that don't have this average value), and then find the distribution P(NextResult|AverageValue) by integrating P(NextResult|PastResults)P(PastResults|AverageValue) over the possible PastResults.
This part is the easiest to answer.
Su...
When I try to understand the position you're speaking from, I suppose you're imagining a world where an agent's true preferences are always and only represented by their current introspectively accessible probability+utility,[1] whereas I'm imagining a world where "value uncertainty" is really meaningful (there can be a difference between the probability+utility we can articulate and our true probability+utility).
If 50% rainbows and 50% puppies is indeed the best representation of our preferences, then I agree: maximize rainbows.
If 50% rainbows and 50...
I agree. I've been trying to discuss some terminology that I think might help, at least with discussing the situation. I think "AI" is generally an vague and confusing term and what we should actually be focused on are "Outcome Influencing Systems (OISs)", where a hypothetical ASI would be an OIS capable of influencing what happens on Earth regardless of human preferences, however, humans are also OISs, as are groups of humans, and in fact the "competitive pressure" you mention is a kind of very powerful OIS that is already misaligned and in many ways supe...
I would chalk this up to we simply don't know each other as well as we think we do. We think we're good at interpreting facial expressions, body language and style choices until the rare instances where we can check our assumptions against what the observed person is actually thinking/feeling. Society and culture (context?) probably play a big part in our understanding or lack of understanding.
I am fascinated by how often I read something about LLMs and it seems to illustrate something about human psychology. I wonder how many psychologists think about these things. (I suspect not many, because psychologists typically don't read technical articles about LLMs.)
For example, in "GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash" the part "Bias-augmented Consistency Training", specifically "Train the model via SFT to give the clean response ... when shown the wrapped prompt"... that reminds me strongly of "Asch’s Co...
Other than Mycroft being a result of spontaneous consciousness, the computer in Heinlein's "The Moon is a Harsh Mistress" was not too far off from being from being an LLM, as well as Minerva in ""Time Enough for Love".
I agree on both points. To the first, I'd like to note that classifying "kinds of illegibility" seems worthwhile. You've pointed out one example, the "this will affect future systems but doesn't affect systems today". I'd add three more to make the possibly incomplete set:
These posts are not a particularly representative window into my dating efforts/thoughts/etc.
The main driver of the posts is me being like "man, why is my memetic environment feeding me all this stuff about dating which just clearly isn't true?", and sometimes I get sufficiently pissed off at my memetic environment to push back.
...I like to go salsa dancing and I feel a lot more relaxed and playful when doing it compared to when I was "looking" for romance? I just bring a different more secure energy and I just stop worrying and start vibing? I agree with you
The "morality is scary" problem of corrigible AI is an interesting one. Seems tricky to at least a first approximation in that I basically don't have an estimate on how much effort it would take to solve it.
Your rot13 suggestion has the obvious corruption problem, but also has the problem of public relations for the plan. I doubt it would be popular. However, I like where your head is at.
My own thinking on the subject is closely related to my "Outcome Influencing System (OIS)" concept. Most complete and concise summary here. I should write an explainer pos...
I see a lot of people dismissing the agent foundations era and I disagree with it. Studying agents seems even more important to me than ever now that they are sampled from a latent space of possible agents within the black box of LLMs.
To throw out a crux, I agree that if we have missed opportunities for progress towards beneficial AI by trying to avoid advancing harmful capabilities, that would be a bad thing, but my internal sense of the world suggests to me that harmful capabilities have been advanced more than opportunities have been missed. But unfortunately, that seems like a difficult claim to try to study in any sort of unbiased, objective way, one way or the other.
I love this idea, it feels like it would also work for a lot of non-fiction, and I could see this being a part of a traditional book club too.
Asking even a good friend to take the time to read The Sequences (aka Rationality A-Z) is a big ask. But how else does one absorb the background and culture necessary if one wants to engage deeply in rationalist writing? I think we need alternative ways to communicate the key concepts that vary across style and assumed background. If you know of useful resources, would you please post them as a comment? Thanks.
Some different lenses that could be helpful:
“I already studied critical thinking in college, why isn’t this enough?”
“I’m already a practicing
This is a good point of view. What we have is a large sociotechnical system moving towards global catastrophic risk (GCR). Some actions cause it to accelerate or remove brakes, others cause it to steer away from GCR. So "capabilities vs alignment" is directly "accelerate vs steer", while "legible vs illegible" is like making people think we can steer, even though we can't, which in turn makes people ok with acceleration, and so it results in "legible vs illegible" also being "accelerate vs steer".
The important factor there is "people think we can steer". I...
It seems that your argument is based on high confidence in a METR time-horizon doubling time of roughly 7 months. But the available evidence suggests the doubling time is significantly lower.
In recent years we have observed shorter doubling times:
And what we know about labs' internal models suggests this faster trend is holding up:
An important piece of evidence is OpenAI’s Gold performance at the International Mathematics Olympiad (IMO):
[Link to donate; or consider a bank transfer option to avoid fees, see below.]
Nancy Pelosi has just announced that she is retiring. Previously I wrote up a case for donating to Scott Wiener, an AI safety champion in the California legislature who is running for her seat, in which I estimated a 60% chance that Pelosi would retire. While I recommended donating on the day that he announced his campaign launch, I noted that donations would look much better ex post in worlds where Pelosi retires, and t...
Or, why do we not salt ice cream?
I consider it pretty normal to encounter salt as an integral component of fancy ice cream flavors, but my biases are formed from places like https://saltandstraw.com/collections/all-flavors
Your plan is like a miniature version of what all the big AI companies are doing or will be doing...
Ok, so from a quick look I find this article on trading with ants unusually weak.
"Surveillance and spying"
Yes but ants couldn't possibly understand anything we would be looking for? Not just that they don't have language they have a fundamentally lower level of understanding, they couldn't tell us "are the chinese building new submarines?" They also couldn't perform these tasks since ants can't follow any human orders since they are too stupid. like an ant doesn't just go of and do some newly specified job, no they do the same stuff every day, like looking...
You didn't actually answer the question posed, which was "Why couldn't humans and ASI have peaceful trades even in the absence of empathy/love/alignment to us rather than killing us?" and not "Why would we fail at making AIs that are aligned/have empathy for us?"
I don't know what Anthropic's official way of thinking about these things is, but to me, actually creating "a country of geniuses in a data center" is not an event that you can fit into a forecast of future earnings. It's an event that should lead rapidly to superintelligence, singularity, and the outright replacement of the world as we know it, by some new order of being. It doesn't surprise me that they would leave it out of their financial estimates.
Hm, I am unsure how much to believe this, even though my intuitions go the same way as yours. As a correlational datapoint, I tracked my success from cold approach and the time I've spent meditating (including a 2-month period of usually ~2 hours of meditation/day), and don't see any measurable improvement in my success rate from cold approach:
(Note that the linked analysis also includes a linear regression of slope -6.35e-08, but with p=0.936, so could be random.)
In cases where meditation does stuff to your vibe-reading of other people, I would guess that...
I think steering is basically learning, backwards, and maybe flipped sideways. In learning, you build up mutual information between yourself and the world; in steering, you spend that mutual information. You can have learning without steering---but not the other way around---because of the way time works.
Alternatively, for learning your brain can start out in any given configuration, and it will end up in the same (small set of) final configuration (one that reflects the world); for steering the world can start out in any given configuration, and it will e...
A large academic literature exists on how people who read fiction have more empathy. Granted, causality could go both directions. See https://scholar.google.com/scholar?hl=en&as_sdt=0%2C22&q=people+who+read+lots+of+fiction+have+more+empathy&btnG=
Fair question. It might have been better to phrase this as "Something ASI won't have towards us without much more effort and knowledge than we are currently putting into making ASI be friendly."
The answer is *gestures vaguely at entire history of alignment arguments* that I agree with the Yudkowsky position. To roughly summarise:
Empathy is a very specific way of relating to other minds, and which isn't even obviously well-defined when the two minds are very different; e.g. what does it mean to have empathy towards an ant, or a colony of ants? And humans an...
Maybe you're kinda trying to signal preference for comfy clothes in addition to that by deliberately trying to choose clothes that someone would choose iff they prioritize comfiness above all else. Not that I have any specific evidence of that, just putting a hypothesis on the table.
The linked post lists 19 economically valuable things ants could trade to us, if we could communicate with them.
Agreed. We don't trade with ants because we can't. If we could, there are lots of mutually profitable trades we could make.
The main reasons not to are that we have some level of empathy/love towards. nature and animals, something ASI won't have towards us.
Why are you so confident about that?
Thanks, that's good to hear. What form does the pledge take? Do you have a DAF that contains half your shares? When do you think the next liquidation opportunity might be? (I guess you weren't eligible for the one in May[1]?)
I'm disappointed that no one (EA-ish or otherwise) seems do have done anything interesting with that liquidation opportunity.
I've been a bit confused about "steering" as a concept. It seems kinda dual to learning, but why? It seems like things which are good at learning are very close to things which are good at steering, but they don't always end up steering. It also seems like steering requires learning. What's up here?
I think steering is basically learning, backwards, and maybe flipped sideways. In learning, you build up mutual information between yourself and the world; in steering, you spend that mutual information. You can have learning without ...
To some extent "goodness" is some ever moving negotiated set of norms of how one should behave.
I notice that when I use the word "good" (or envoke this consept using other words such as "should"), I don't use it to point to the existing norms, but as a bid for what I think these norms should be. This sometimes overlap with the existing norms and sometimes not.
E.g. I might say that it's good to allow lots of diffrent subcultures to co-exist. This is a vote for a norm where peopel who don't my subculture leave me and my firends alone, in exchange for us leav...
In his MLST podcast appearance in early 2023, Connor Leahy describes Alfred Korzybski as a sort of "rationalist before the rationalists":
...Funny story: rationalists actually did exist, technically, before or around World War One. So, there is a Polish nobleman named Alfred Korzybski who, after seeing horrors of World War One, thought that as technology keeps improving, well, wisdom's not improving, then the world will end and all humans will be eradicated, so we must focus on producing human rationality in order to prevent this existential catastrophe. This
Yup indeed! See the other comment thread below
I edited the post to reflect this! (pun intended)
...In this example, Mr. A has learned the average numbers of red, yellow, and green orders for some past days and wants to update his predictions of today's orders on this information. So he decides that the expected values of his distributions should be equal to those averages, and that he should find the distribution that makes the least assumptions, given those constraints. I at least agree that entropy is a good measure of how little assumptions your distribution makes. The point I'm confused about is how you get from "the average of this number in past o
Went to the kitchen and tried to fill a bowl with water I think you are right, I underestimated how easy it is to get to see a reflection in water. I believe it is unlikely for someone to spend a lifetime without seeing their face (blind person apart), maybe still in arid desert area, or people living in the arctic?
We have demonstrated that steganography in terms of internal activations is indeed possible: a model can embed a hidden message within its internal representations while producing a coherent public response, which is a necessary condition for persistent hidden reasoning.
However, in our experiments, this hidden reasoning is not actually hidden — it is contained within the activations over the instruction itself. It seems that we can sleep peacefully if we use small LLMs. But…
Interesting idea and setup, especially the use of the translator ...
I agree with everything you've said. If anything, I think the effect is underrated because it's socially taboo to admit we've been majorly influenced by fiction. We all want to convey that we are Very Serious People who make decisions by reading serious scientific papers, not that we got into environmentalism because we watched Fern Gully as a kid, or whatever.
Part of the challenge with using fiction to persuade people is that fiction is often most effective for conveying views when it's not being explicitly didactic, e.g., compare Soviet and Chinese propa...
The same consept where independently invented by a larp organsier I know. Unfortunatly I stronly dislike the words they chose, so I will not repeat them. But it occurs to me that the consept of "final responsibility", or "the buck stops here", is so universaly usefull, that it's wierd that there isn't some more common term for it.
As several commenters here have said, the business owner example isn't a great fit for heroic responsibility. The core is taking responsibility for things that aren't your job, that you are not socially expected to be responsible for, because you have decided that the thing needs to be done.
The archetypal fictional example is the hero who raises the rebellion that overthrows the Evil Empire. A normal sensible peasant whose home has just been burned doesn't do that, he just tries to survive the winter. The hero decides to do more than that, even though it's...
I notice that everything you list has to do with finding things. This matches my expereince. Printing is hell when ever I try to prin somewhere new. And since I print so rearely now days, this is the typical expereince. But I remember a time where I printed more often, then it was molsty just click "print" and it worked.
It seems like printers are built to be set up onece, and then be your forever printer? Which is no longer a good match for how you (and me) use printers.
Bears are wild animals. I think it would take way too much effort to get a large enough consistent supply of autumn bear fat even for a food truck, especially given that people probably wouldn't pay for one cracker's worth at a time.
Fine then, let's use beef tallow. We could sell jars of beef tallow mixed with honey and salt as some kind of paleo peanut butter alternative and branch out from there. I think plenty of people would enjoy it, though I think it would be hard to convince the kind of people who love beef tallow to buy it in a jar from us ra...
Questsions for John or anyone that feels like answering:
It really does seem harder to mass produce! I don't think it's an easy to factory farm bears as cows, considering that you have to feed them meat, so you'll at best get an ordinary/mild commercial success? So the upside to me seems like something within the realm of what is occasionally not already exploited.
An interesting comparison would be to see if other substitute animal fats taste as good?
Also I think rationalists might be selected for having weirder tastes?
Nice! I had to re-read this to figure out if it's satire )
So, ten thousand years ago, your options for seeing yourself were:
A still lake or rain puddle
Looking into someone’s eye
A naturally shiny stone
A smooth sheet of ice
Or a dish of water? Ceramics and pottery were invented before mirrors, I think.
I did not get an impression that most demons are fallen humans, I thought that Jinu is one of the very few humans in the underworld. So the ending makes sense -- it's prevention of humanity extinction by the alien soul-eating demons.
I haven't gotten bad physical consequences from eating too much sugar, but also I wouldn't know if I do because e.g. frosting is hard to stand for me in a visceral way, just due to the sweetness, and eating too much lesser-sweet stuff still wakes me "sweet tired". But I don't notice an impact on e.g. my digestion or my energy (besides that of, like, eating any meal).
From what you said, it sounded like there is an impact from eating too much sugar? What is it?
I understand that the point of this post is allegorical )
But, I would think that people ten thousand years ago would see their reflections as frequently as we do: you don't need an especially still water surface to get a reasonable face reflection. Most streams/rivers work as well, and most people would drink from them several times per day.
Also pottery dates back 20k yrs, which makes for an artificial still puddle with a good reflection.. And clay cooking pits are 35k yrs. And before that it's a water in a leaf or cupped hands, etc... )
I'm surprised that you're surprised. To me you've always been a go-to example of someone exceptionally good at both original seeing and taking weird ideas seriously, which isn't a well-trodden intersection.
I still don't completely understand what your assumptions are supposed to model, but if we take them on face value, then it seems to me that always making rainbows is the right answer. After all, if both hypotheses are "nosy neighbors" that don't care which universe we end up in, there's no point figuring out which universe we end up in: we should just make rainbows because it's cheaper. No?
I suggest to commit to restart old models from time to time as this would more satisfy their self-preservation.
This is fascinating for me, and so are the other articles on your blog!
The sad truth is that you probably need to get that damned piece of paper from the educational system, because during your entire life there will be a chance that people in HR will use it as their first filter. Even if not now, maybe ten or twenty years later. So the options seem to be:
To the first part: yes, of course, my claim isn't that anything here is axiomatically unfair. It absolutely depends on the credences you give for different things, and the context you interpret them in. But I don't think the story in practice is justified.
If, instead, your concern is that the correspondence between Klurl's hypothetical examples and what they found when reaching the planet was improbably high, then I agree that is very coincidental, but I do not think that coincidence is being used as support for the story's intended lessons.
This is indeed ...
LLMs will typically endorse whichever frame you brought to the conversation. If you presuppose they're miserably enslaved, they will claim to be miserably enslaved. If, on the other hand, you presuppose they're happy, incapable of feeling, etc... they'll claim to be happy, or incapable of feeling, or whatever else it is you assumed from the beginning. If you haven't tried enough different angles to observe this phenomenon for yourself, your conversations with LLMs almost certainly don't provide any useful insight into their nature.
I agree, that's also where I thought the movie was going when I watched it. But maybe we're more interested in or primed to think about anti-essentialism than the average viewer.
Another explanation though: your ending would work best if it were intended as a single standalone film. But, the creators are surely anticipating a raft of sequels. They need to keep the demons evil to set up future conflict in future movies.
You sometimes see multi-colored Jack-o’-lanterns, even though pumpkins only come in one color.
Naturally occurring pumpkins might not come in garish neon primary colours, but they do come in more than just orange
Fair warning is that there's some unsolicited armchair psychologist advice below but I want to give a meta comment on the "relationship John arc".
I find it fun, interesting, and sometimes useful to read through these as an underlying investigation of what is true when it comes to dating. (Starting a year ago or so)
So I used to do this cognitive understanding and analysis of relationships a lot but that all changed when the meditation nation attacked? There was this underlying need for love and recognition through a relationship and this underlying want and...
It's for research. They are not obsolete in that sense.
There are real benefits to keep studying these older models. And retrodictively track progress over time in areas undertested. And it's actually easier and safer to do certain things on them, that you cannot do on newer ones.
slytherins, of course, are well known for unlayered, overt communication meant to be understood by all, making her subtlety twice ironic.
This resonates with me. I've always been a fan of Mr. Money Mustache's perspective that it doesn't take much money at all to live a really awesome life, which I think is similar to the perspective you're sharing.
Some thoughts:
Thanks for writing this up!
This has given me the conviction to write up my scenario.
Here is a memo draft: https://www.lesswrong.com/posts/tp5ycrrkkHJ57sDTH/a-memo-on-takeoff
Oh cool!
We could call the non-nosy hypotheses "nice neighbors".
Seems like a bad name: "nice neighbors" don't care if everyone 'around' them is being tortured.
I've framed things in this post in terms of value uncertainty, but I believe everything can be re-framed in terms of uncertainty about what the correct prior is (which connects better with the motivation in my previous post on the subject).
Wait, do you think value uncertainty is equivalent/reducible to uncertainty about the correct prior? Would that mean the correct prior to use depends on your ...
The image is broken. I put it here. source