It seems to me like buying an investment property is almost always a bad decision, because 1) single properties are very volatile, 2) you generally have to put a very large chunk of your net worth (sometimes even >100%!) in a property that's completely undiversified, and 3) renting out a property is work and you likely could get a better hourly elsewhere.
The only advantages I see are that there's far more cheap leverage available to retail investors in real estate than other sectors, and mortgages can act as a savings commitment device. Are there other reasons I'm missing that explain the apparent popularity of these investments?
If you say "all persons" you have to define what a person is.
Part c) of your thought experiment makes this trivial: a "person" is anyone you could be swapped with.
One thing you could do is give users relatively more voting power if they vote without seeing the author of the post. I.e., you can enable a mode which hides post authors until you give a vote on the anonymized content. After that, you can still vote like normal.
Obviously there are ways author identity can leak through this, but it seems better than nothing.
...Background and Hypothesis
Auditory-verbal hallucinations (AVH)—the experience of hearing voices in the absence of auditory stimulation—are a cardinal psychotic feature of schizophrenia-spectrum disorders. It has long been suggested that some AVH may reflect the misperception of inner speech as external voices due to a failure of corollary-discharge-related mechanisms. We aimed to test this hypoth
Thanks for the link to Wolfram's work. I listened to an interview with him on Lex I think, and wasn't inspired to investigate further. However what you have provided does seem worthwhile looking into.
You should probably link some posts, it's hard to discuss this so abstractly. And popular rationalist thinkers should be able to handle their posts being called mediocre (especially highly-upvoted ones).
I felt confused at first when you said that this framing is leaning into polarization. I thought "I don't see any clear red-tribe / blue-tribe affiliations here."
Then I remembered that polarization doesn't mean tying this issue to existing big coalitions (a la Hanson's Policy Tug-O-War), but simply that it is causing people to factionalize and create an conflict and divide between them.
I think it seems to me like Max has correctly pointed out a significant crux about policy preferences between people who care about AI existential risk, and it also se...
Yeah, makes sense. Something else I failed to mention is that pathology also requires that we're not simply dealing with a reasoned decision of someone who could've just as soon decided something else, but with a decision that is so multiply overdetermined by traumatic adaptations that it's almost impossible for the person to do anything else. So the type of decision process also makes a difference.
As far as people who instead want the values to change go, they usually have an idea of a good direction for them to change - usually they're people who are far from the median of society and so they would like society to become more like them.
I have in mind another conjecture: even median humans value humans with values that are, in their minds, at least as moral as median humans, and ideally[1] more moral.
On the other hand, I have seen conservatives building cases for SOTA liberal values being damaging to the minds or outright incompatible wit...
After doing some more research I am not sure that it's always possible to derive a public key knowing only the evaluation key; it seems to depend on the actual FHE scheme.
So the trilemma may be unaffected by this hypothetical. There's also the question of duplication vs. unification for an observer that has the option to stay at base level reality or enter a homomorphically encrypted computation and whether those should be considered equivalent (enough).
Great work!
Listened to a talk from Philipp on it today and am confused on why we can't just make a better benchmark than LDS?
Why not just train eg 1k different models, where you left 1 datapoint out? LDS is noisy, so I'm assuming 1k datapoints that exactly capture what you want is better than 1M datapoints that are an approximation [1]
As an estimate, Nano-GPT speedrun takes a little more than 2 min now, so you can train 1001 of these in:
2.33*1k/60 = 38hrs on 8 H100's which is maybe 4 b200's which is $24/hr, so ~$1k.
And that's getting a 124M param LLM...
I agree the term is in common use, but there is value in proposing a detailed operationalization of a concept that otherwise has a fuzzy referent. This is one way to ground timelines debates and make forecasts cross-comparable, as we discuss in the piece.
In many ways, Andy Masley's post has rediscovered the "Other people are wrong vs I am right" post, but gives actual advice for how to avoid being too hasty in generalizing from other people being wrong to myself being right.
They can automate it by quick search of already published ideas and quick writing code to testing new ideas.
I decided not to include an example in the post, as it directly focuses on a controversial issue, but one example of when this principle was violated and made people unreasonably confident was when people updated back in 2007-2008 that AI risk was a big deal (or at least had uncomfortably high probabilities), based on the orthogonality thesis and instrumental convergence, which attacked and destroyed 2 bad arguments at the time:
...Speed prior type reasons. Like, a basic intuition is "my experiences are being produced somehow, by some process". Speed prior leads to "this process is at least somewhat efficient".
Like, usually if you see a hard computation being done (e.g. mining bitcoin), you would assume it happened somewhere. If one's experiences are produced by some process, and that process is computationally hard, it raises the question "is the computation happening somewhere?"
I agree with J Bostock. I see no problem with A. Why do you think that polynomial complexity is this important?
(Thanks for a very nice structuring, btw!)
Another contemporary example: ANTIBIOTICS.
I went abroad and studied antimicrobial resistance briefly, while doing a master in cellular biology. I did hands-on virulence research in safety labs, and a lot of theory.
Bacteria are simple. That's why we have already exhausted all major pathways for drug mechanisms.
At that time, multiresistant bacteria were already everywhere. Resistant pathogens were found deep in the Amazon, and in Antarctica
Resistance will only increase. It will be bad. Could be real bad. Back to times hospital care can't cure any...
That's right. One exception: sometimes I upvote posts/comments written to low standards in order to reward the discussion happening at all. As an example I initially upvoted Gary Marcus's first LW post in order to be welcoming to him participating in the dialogue, even though I think the post is very low quality for LW.
(150+ karma is high enough and I've since removed the vote. Or some chance I am misremembering and I never upvoted because it was already doing well, in which case this serves as a hypothetical that I endorse.)
Cooperation between humans and AIs rather than an attempt to control AIs. I think the race is going to happen regardless of who drops out of it. If those who are in the lead eventually land on mutual alignment, then we stand a chance. We're not going to outsmart the AIs nor will we stay on control of them, nor should we.
The point is to develop models within multiple framings at the same time, for any given observation or argument (which in practice means easily spinning up new framings and models that are very poorly developed initially). Through the ITT analogy, you might ask how various people would understand the topics surrounsing some observation/argument, which updates they would make, and try to make all of those updates yourself, filing them under those different framings, within the models they govern.
...the salience and methods that one instinctively chooses are
(I would like to note that a single person went through and strong downvoted my comments here.)
Yep. Put another way: With Y2K, the higher-quality "predictions of doom" were sufficiently specific that they were also a road map to preventing the doom.
(If nothing else, you could frequently test a system by running the system clock ahead to 1999-12-31 23:59:59 and waiting a moment to see if anything caught fire.)
Oh, maybe what you are imagining is that it is possible to perceive a homomorphic mind in progress, by encrypting yourself, and feeding intermediate states of that other mind to your own homomorphically encrypted mind. Interesting hypothetical.
I think with respect to "reality" I don't want to be making a dogmatic assumption "physics = reality" so I'm open to the possibility (C) that the computation occurs "in reality" even if not "in physics".
I'm annoyed that Tegmark and others don't seem to understand my position: you should try for great global coordination but also invest in safety in more rushed worlds, and a relatively responsible developer shouldn't unilaterally stop.
(I'm also annoyed by this post's framing for reasons similar to Ray.)
To perform homomorphic operations you need the public key, and that also allows one to encrypt any new value and perform further hidden computations under that key. The private key allows decryption of the values.
I suppose you could argue that the homomorphically encrypted mind exists ala mathematical realism even if the public key is destroyed, but it would be something "outside reality" computing future states of the encrypted mind after the public key is no longer available.
Thank you! Do you have a concrete example to help me better understand what you mean? Presumably the salience and methods that one instinctively chooses are those which we believe are more informative, based on our cumulative experience and reasoning. Isn't moving away from these also distortionary?
I am glad someone said this. This is a no-brainer suggestion and something fundamental and important that "the camps" can agree on.
links 10/23/25: https://roamresearch.com/#/app/srcpublic/page/10-23-2025
I define “AI villain data” to be documents which discuss the expectation that powerful AI systems will be egregiously misaligned. ... This includes basically all AI safety research targeted at reducing AI takeover risk.
AGIs should worry about alignment of their successor systems. Their hypothetical propensity to worry about AI alignment (for the right reasons) might be crucial in making it possible that ASI development won't be rushed (even if humanity itself keeps insisting on rushing both AGI and ASI development).
If AGIs are systematically prevented f...
Yes, but you'd naively hope this wouldn't apply to shitty posts, just to mediocre posts. Like, maybe more people would read, but if the post is actually bad, people would downvote etc.
...2: We have more total evidence from human outcomes
Additionally, I think we have a lot more total empirical evidence from "human learning -> human values" compared to from "evolution -> human values". There are billions of instances of humans, and each of them presumably have somewhat different learning processes / reward circuit configurations / learning environments. Each of them represents a different data point regarding how inner goals relate to outer optimization. In contrast, the human species only evolved once. Thus, evidence from "human learn
The problems with local positivism seem to me... kinda important philosophically, but less so in practice.
Yes, most of the time they don't matter, but then sometimes they do! I think in particular the wrongness of logical positivism matters a lot if you're trying to solve a problem like proving that an AI is aligned with human flourishing because there's a specific, technical answer you want to guarantee but it requires formalizing a lot of concepts that normally squeak by because all the formal work is being done by humans who share assumptions. But when you need the AI to share those assumptions, things get dicier.
The effect seems natural and hard to prevent. Basically, certain authors get reputations for being high (quality * writing), and then it makes more sense for people to read their posts because both the floor and ceiling are higher in expectation. Then their worse posts get more readers (who vote) than posts of a similar quality by another author, who's floor and ceiling is probably lower.
I'm not sure the magnitude of the cost, or that one can realistically expect to ever prevent this effect. For instance, all Scott Alexander blogposts get more readership t...
As for measuring alignment, one could do something similar to Claude (and a version of GPT?) playing Undertale or another game where one can achieve goals in unethical ways, but isn't obliged to do so.[1] The experiment with Undertale is evidence for Claude being aligned. However, a YouTuber remarked that GPT suggested a line of action which would likely lead to the Genocide Ending.
Zero-sum games, like Diplomacy where o3 deceived a Claude into battling against Gemini, fall into the latter category since winning the game means that others lose.
The problems with local positivism seem to me... kinda important philosophically, but less so in practice.
Kinda like having Gödel's incompleteness proof it mathematics -- yes, it is shocking and yes it has some serious consequences, but... it has practically zero effect on high-school mathematics.
Similarly, the fact the verification principle is not itself an empirical fact is... a good argument against the generalization that everything must be an empirical fact. Yes, there is a place for abstractions, and general assumptions. And yet, I think that scienc...
Yeah that seems like a pretty reasonable false fact to insert into the model.
This is an example where framings are useful. An observation can be understood under multiple framings, some of which should intentionally exclude the compelling narratives (framings are not just hypotheses, but contexts where different considerations and inferences are taken as salient). This way, even the observations at risk of being rounded up to a popular narrative can contribute to developing alternative models, which occasionally grow up.
So even if there is a distortionary effect, it doesn't necessarily need to be resisted, if you additionally enter...
My guess is that the ideal is something like a default Ask culture with specific Guess culture contexts when it genuinely is worth the extra consideration.
IMO the ideal is a culture where everyone puts some reasonable effort into Guessing when feasible, but where Asking is also fully accepted.
The huge problem with that is that the extreme deadliness of one of ideologies that has rushed in to fill the void caused by the discrediting of Christianity: namely, the one (usually referred to vaguely by "progress" or "innovation") that views every personal, organizational and political decision through the lens of which decision best advances or accelerates science and technology.
Is this really a widely held ideology? My impression is that the AI race is driven by greed much more than ideology.
We used Claude Sonnet 4 for the agents and narration, and Claude 3.5 Sonnet for most of the evaluation.
We haven't made any specific plans yet on how to measure alignment; our first goal was to check if there were observable differences at all, before making those differences properly measurable.
I think if someone is very well-known their making a particular statement can be informative in itself, which is probably part of the reason it is upvoted.
RL can develop particular skills, and given that IMO has fallen this year, it's unclear that further general capability improvement is essential at this point. If RL can help cobble together enough specialized skills to enable automated adaptation (where the AI itself will become able to prepare datasets or RL environments etc. for specific jobs or sources of tasks), that might be enough. If RL enables longer contexts that can serve the role of continual learning, that also might be enough. Currently, there is a lot of low hanging fruit, and little things ...
My takes on your comment:
Intelligence really is giant incomprehensible matrices with non-linear functions tossed in (at best).
I think this is possible, but I currently suspect the likely answer is more boring than that, and it's the fact that getting to AGI with a labor-light, heavy compute approach (as evolution did) means that it's not worth investing much in interpretable AIs, even if strong AIs that were interpretable existed, and a similar condition holds in the modern era. But one of the effects of AIs that can replace humans is that it disproportion...
Okay, sorry about this. You are right. I have a thought up a somewhat nuanced view about how prosaic corrigibility could work and I kind of just assumed that was the same was what Max had because he uses a lot of the same keywords I use when I think about this, but after actually reading the CAST article (or I read part 0 and 1), I realize we have really quite different views.
Parenthetically, I do not yet know of anyone in the "never build ASI" camp and would be interested in reading or listening to such a person.
Would you agree that we have about as much of a handle on what corrigibility is as we do on what an agent is? Like, I claim that I have some knowledge about corrigibility, even though it's imperfect and I have remaining confusions. And I'm wondering whether you think humanity is deeply confused about what corrigibility even is, or whether you think it's more like we have a handle on it but can't quite give its True Name.
If you want to slow down AI Research, why not try to use the "250 documents method" to actively poison the models and create more busy-work for the AI companies?
Part is thinking about donation opportunities, like Bores. Hopefully I'll have more to say publicly at some point!
Can you say more about the projects you're spending your time on now?
Thanks for this follow-up. My basic thoughts on the comment above this one is that while I agree that you definitely can't get a perfectly corrigible agent on your first try, you might, by virtue of the training data resembling the lab setting, get something that in practice doesn't go off the rails, and instead allows some testing and iterative refinement (perhaps with the assistance of the AI). So I think "iteration [can/can't] fix a semi-corrigible agent" is the central crux.
I just read your WWIDF post (upvoted!) and while I agree that the issues you po...
I donated $7K to Scott and $7K to Bores.
Yeah, thanks. Feel free to DM me or whatever if/when you finish a post.
One thing I want to make clear is that I'm asking about the feasibility of corrigibility in a weak superintelligence, not whether setting out to build such a thing is wise or stable.
I was actually expecting Penny to develop dystonia coincidentally, and the RL would tie-in by needing to be learned in reverse ie optimizing from dystonic to normal. It is a much more pleasant ending than the protagonist's tone the whole way through.
If I was writing a fanfic of this, I'd keep the story as is (+ or - the last paragraph), but then continue into the present moment which leads to the realization.
you can't just train your ASI for corrigibility because it will sit and do nothing
I'm confused. That doesn't sound like what Max means by corrigibility. A corrigible ASI would respond to requests from its principal(s) as a subgoal of being corrigible, rather than just sit and do nothing.
Or did you mean that you need to do some next-token training in order to get it to be smart enough for corrigibility training to be feasible? And that next-token training conflicts with corrigibility?
Thanks, updated to forecasters, does that seem fair?
Also, I know this is super hard, but do you have a sense of what superforcasters might have guessed back then?
While I think LW’s epistemic culture is better than most, one thing that seems pretty bad is that occasionally mediocre/shitty posts get lots of upvotes simply because they’re written by [insert popular rationalist thinker].
Of course, if LW were truly meritocratic (which it should be), this shouldn’t matter — but in my experience, it descriptively does.
Without naming anyone (since that would be unproductive), I wanted to know if others notice this too? And aside from simply trying not to upvote something because it’s written by a popular author, anyone have good ideas for preventing this?
I do think that progress will slow down, though its not my main claim. My main claim is to do with the tailwind of compute scaling will become weaker (unless some new scaling paradigm appears or a breakthrough saves this one). That is a piece in the puzzle of whether overall AI progress will accelerate or decelerate and I'd ideally let people form their own judgments about the other pieces (e.g. whether recursive self improvement will work, or whether funding will collapse in a market correction, taking away another tailwind of progress). But having a majo...
Doomers predicted that the Y2K bug would cause massive death and destruction. They were wrong.
This seems like a misleading example of doomers being wrong (agree denotationally, disagree connotationally), since I think it's plausible that Y2K was not a big deal (to such an extent that "most people think it was a myth, hoax, or urban legend") precisely because of the mitigation efforts stemmed by the doomsayers' predictions.
I've been thinking about what I'd call memetic black holes: regions of idea-space that have gathered enough mass that they will suck in anything adjacent to them, distorting judgement for believers and skeptics alike.
The UFO topic is, I think, one such memetic black hole. The idea of aliens is so deeply ingrained in our collective psyche that it is very hard to resist the temptation to attach to it any kind of e.g. bizarre aerial observation. Crucially, I think this works both for those who definitely do and those who definitely don't believe that UF...
Recently I've been spending much less than half of my time on projects like AI Lab Watch. Instead I've been thinking about projects in the "strategy/meta" and "politics" domains. I'm not sure what I'll work on in the future but sometimes people incorrectly assume I'm on top of lab-watching stuff; I want people to know I'm not owning the lab-watching ball. I think lab-watching work is better than AI-governance-think-tank work for the right people on current margins and at least one more person should do it full-time; DM me if you're interested.
For me the linked site with the statement doesn't load. And this was also the case when I first tried to access it yesterday. Seems less than ideal.
Cool simulation!
I also have to add that I find the idea that a cyclist wouldn't cycle on a road absurd. I don't think I know a single person who wouldn't do this, presumably a US vs EU thing.
You mean the "No Way No How" group? If so, yeah, it feels implausible to me as well. I have a feeling that for people who were surveyed and said this, it wouldn't match their actual behavior if they were able to experience an area with genuinely calm roads.
I agree that the statement doesn't require direct democracy but that seems like the most likely way to answer the question "do people want this".
Here's a brief list of things that were unpopular and broadly opposed that I nonetheless think were clearly good:
Generally I feel like people sometimes oppose things that seem disruptive and can be swayed by demagogues. There's a reason that representative democracy works better than direct democracy. (Tho...
Right so, by step 4 I'm not trying to assume that h is computationally tractable; the homomorphic case goes to show that it's probably not in general.
With respect to C, perhaps I'm not verbally expressing it that well, but the thing you are thinking of, where there is some omniscient perspective that includes "more than" just the low level of physics (where the "more than" could be certain informational/computational interconnections) would be an instance. Something like, "there is a way to construct an omniscient perspective, it just isn't going to be straightforwardly derivable from the physical state".
That's reasonable, but it seems to be different from what these quotes imply:
So while we may see another jump in reasoning ability beyond GPT-5 by scaling RL training a further 10x, I think that is the end of the line for cheap RL-scaling.
... Now that RL-training is nearing its effective limit, we may have lost the ability to effectively turn more compute into more intelligence.
There are a bunch of quotes like the above that make it sound like you are predicting progress will slow down in a few years. But instead you are saying that progress will continue,...
Yeah that seems like a case where non-locality is essential to the computation itself. I'm not sure how the "provably random noise from both" would work though. Like, it is possible to represent some string as the xor of two different strings, each of which are themselves uniformly random. But I don't know how to generalize that to computation in general.
I think some of the non locality is inherited from "no hidden variable theory". Like it might be local in MWI? I'm not sure.
Seems like the first two points contradict each other. How can an llm not be good at discovery and also automate human R&D
...Many different training scenarios are teaching your AI the same instrumental lessons, about how to think in accurate and useful ways. Furthermore, those lessons are underwritten by a simple logical structure, much like the simple laws of arithmetic that abstractly underwrite a wide variety of empirical arithmetical facts about what happens when you add four people's bags of apples together on a table and then divide the contents among two people.
But that attractor well? It's got a free parameter. And that parameter is what the AGI is optimizing for.
The OP is about two "camps" of people. Do you understand what camps are? Hopefully you can see that this indeed does induce the analog of "because the claim of fakeness is about the entirety of the image". They gain and direct funding, consensus, hiring, propaganda, vibes, parties, organizations, etc., approximately as a unit. Camp A is a 90% poison twinkie. The fact that you are trying to not process this is a problem.
none of this means Sam Altman shouldn’t be welcome at Lighthaven, and Holly clarifies that even she agrees on this
That is not my reading of the linked tweet (which just agrees that Lighthaven wasn't "dazzled"), and the opposite is my reading of this tweet and its replies.
Um, no, you responded to the OP with what sure seems like a proposed alternative split. The OP's split is about
people who self-identify as members of the AI safety community
I think you are making an actual mistake in your thinking, due to a significant gap in your thinking and not just a random thing, and with bad consequences, and I'm trying to draw your attention to it.
There's a huge difference between the types of cases, though. A 90% poisonous twinkie is certainly fine to call poisonous[1], but a 90% male groups isn't reasonable to call male. You said "if most people who would say they are in C are not actually working that way and are deceptively presenting as C," that seems far like the latter than the former, because "fake" implies the entire thing is fake[2].
Though so is a 1% poisonous twinkie; perhaps the example should be a meal that is 90% protein would be a "protein meal" without implying there is no non-prote
I was trying to map out disagreements between people who are concerned enough about AI risk.
Agreed that this represents only a fraction of the people who talk about AI risk, and that there are a lot of people who will use some of these arguments as false justifications for their support of racing.
EDIT: as TsviBT pointed out in his comment, OP is actually about people who self-identify as members of the AI Safety community. Given that, I think that the two splits I mentioned above are still useful models, since most people I end up meeting who self-identify...
I was "let's build it before someone evil", I've left that particular viewpoint behind since realizing how hard aligning it is.
It was empirically infeasible (for the general AGI x-risk technical milieu) to explain this to you faster than you trying it for yourself, and one might have reasonably expected you to have been generally culturally predisposed to be open to having this explained to you. If this information takes so much energy and time to be gained, that doesn't bode well for the epistemic soundness of whatever stance is currently being taken by the funder-attended vibe-consensus. How would you explain this to your past self much faster?
I had a very busy IRL day yesterday and have intended to respond to this.
While I am initially inclined to simply do what you ask out of kindness I am still convinced that I have no real reason to do so and therefore acceding here may portray me as a pushover. This really is an instance where some human neutral third party input to this dispute would be extremely helpful and I wish there was more of a culture online of such interventions. I would expect there to be such a culture here on Lesswrong, but perhaps not.
Nevertheless I did consult a non-human medi...
Can anyone reading this truly deny that those warnings came true from the doom sayer's perspective?
yes. your arrow of causality look backwards to me - I don't see divorce destigmatization - > more divorce. in the divorce case it's clearly more divorce -> destigmatization. i don't remember where to find the posts about how the laws that allow divorce came after the spike in divorce, and not the other way around.
there is important point here. i only recently re-evaluate my opinion on TV and decided the doomers was right there. but it sure look to m...
Part of the implementation choices might be alienating-- the day after I signed I saw in the announcement email yesterday "Let's take our future back from Big Tech." and maybe a lot of people, who work at large tech companies, who are on the fence don't like that brand of populism.
I think we're just trying to do different things here... I'm trying to describe empirical clusters of people / orgs, you're trying to describe positions, maybe? And I'm taking your descriptions as pointers to clusters of people, of the form "the cluster of people who say XYZ". I think my interpretation is appropriate here because there is so much importance-weighted abject insincerity in publicly stated positions regarding AGI X-risk that it just doesn't make much sense to focus on the stated positions as positions.
Like, the actual people at The Curve or w...
You can either keep them on a short leash and do code review, or you can
Is there a missing segment here? It doesn't seem like a stylistic segway to the next section.
Gary Marcus offered Elon Musk 10:1 odds on the bet, offering to go up to $1 million dollars using Elon Musk’s definition of ‘capable of doing anything a human with a computer can do, but not smarter than all humans combined’, but I’m sure Elon Musk could hold out for 20:1 and he’d get it. By that definition, the chance Grok 5 will count seems very close to epsilon. No, just no.
Nitpick: we don't know what Musk's researchers actually did. If they found the actually capable neuralese architecture, then we are done. But what is the probability that they ...
As for how that gets to "definitely can't": the problem above means that, even if we nominally have time to fiddle and test the system, iteration would not actually be able to fix the relevant problems. And so the situation is strategically equivalent to "we need to get it right on the first shot", at least for the core difficult parts (like e.g. understanding what we're even aiming for).
And as for why that's hard to the point of de-facto impossibility with current knowledge... try the ball-cup exercise, then consider the level of detailed understanding re...
While I agree at a basic level, this also seems like a motte-and-bailey.
There is clearly a vibe that all doomers have obviously always been wrong. The author is clearly trying to push back against that vibe. I too prefer arguing at 'motte' level, but vibes (baileys) matter, and pushing back against one should not require a long airtight argument that stands up to the stronger version of the claims being made. Even though I agree the stronger version would be better, that's true for both sides of any debate.
I mentioned this before, but that interface didn't allow for wide spreads, so the thing that you might be looking at is a plot of people's medians, not the whole distribution. In general Hypermind's user interface was so so shitty, and they paid so poorly, if at all, that I don't think it's fair to describe that screenshot as "superforecasters".
That's comparing apples to oranges. There are doomers and doomers. I don't think the "doomers" predicting the Rapture or some other apocalypse are the same thing as the "doomers" predicting the moral decline of society. The two categories overlap in many people, but they are distinct, and I think it's misleading to conflate them. (Which is kind of a critique of the premise of the article as a whole--I would put the AI doomers in the former category, but the article only gives examples from the latter.)
The existential risk doomers hi...
I had been thinking about the exact same topic when I read this article, only I was using bus routes in my analogy. I created a quick program to simulate these dynamics[1].
It's very simple, there is a grid of squares, let's say 100 by 100, each square has some other square randomly assigned as its goal. Then I generate some paths via random walks until some fraction of squares are paths. Then I check what fraction of squares are connected to their goal via a path.
Doing this we get the following s-curve:
The y-axis shows the fraction of squares that are able...
You make a valid point. Here's another framing that makes the tradeoff explicit:
I guess we could say "mostly fake", but also there's important senses in which "mostly fake" implies "fake simpliciter". E.g. a twinkie made of "mostly poison" is just "a poisonous twinkie". Often people do, and should, summarize things and then make decisions based on the summaries, e.g. "is it poison, or no" --> "can I eat it, or no". My guess is that the conditions under which it would make sense for you to treat someone as genuinely holding position C, e.g. for purposes of allocating funding to them, are currently met by approximately no one. I coul...
That seems like a cool idea for the mediation condition, but Isn't it trivial for the redundancy conditions?
Indeed, that specific form doesn't work for the redundancy conditions. We've been fiddling with it.
The anti-naturality problems are an issue, especially if you want to build the thing via standard RL-esque training, but they're not the first things which will kill you.
The story in the post you link is a pretty standard training story, and runs into the same immediate problems which standard training stories usually run into:
I disagree a bit with your logic here. If 60 % of ChatGPT GPU is cut off as a result of one switch in just one datacenter, the whole model is reduced to something else than what it was. Users with simple queries won't notice (right away). But the model will get dumber instantly.
(How will it copy its weights then?)
What do you think of the argument here (if you read that far) to build this into progenitor models? This idea does not apply to an SI, as I clarify in the text as well.
Let's look at this from today. Current AI is unable to hide all deception from us. It seems reasonable to me that a switch would trigger before stealth SI is deployed.
Which LLMs did you use (for judging, for generating narratives, for peers)? And how do you plan to measure alignment?
The genre here is psychological horror fiction, and the style is first-person short story; so it's reminiscent of Edgar Allan Poe or Ted Chiang; but it's not clearly condensed or tightly edited the way those tend to be, and the narrator's style is prolix and euphuistic. From an editing perspective, I think the question I would have is to what extent this is a lack of editing & killing-your-darlings, and a deliberate unreliable-narrator stylistic choice in which the prose is trying to mirror the narrator's brute-force piano style or perhaps the dystonia...
By the way, your comment shows one thing that's may not be obvious from the outside (and maybe even from the inside): There's a lot of people who are in favour of the European project even if they never say so or act on it in any way. And not because it is cool and sexy, it most definitely isn't, but partly because of the historic experience (every family has stories like yours) and partly because they see EU as a check on their national government, preventing it from going fully bonkers. That being said, this political capital is completely untapped.
maybe i took all the low hanging fruit or something, but doing entire new thing every day is A LOT. like, the things i have to do and didn't, it's because they are hard and take more then 5 minutes. also, i can't even check if it worked, and i don't actually have so many things to do!
like, do you really expect to have 365 small things to do? because that suggestion sounds like applause lights to me - designed to be hard to say "actually, that's insane!", while being totally unrealistic.
also, i agree with Taylor. there are things like fixing small problem, ...
If you said "mostly bullshit" or "almost always disengenious" I wouldn't argue, but would still question whether it's actually a majority of people in group C, which I'm doubtful of, but very unsure about - but saying it is fake would usually mean it is not a real thing anyone believes, rather than meaning that the view is unusual or confused or wrong.
Closely related to: You Don't Exist, Duncan.
I think it is sometimes correct to specifically encourage factionalization, but I consider it bad form to do it on LessWrong, especially without being explicitly self-aware about it. (i.e should come with an acknowledging that you are spending down the epistemic commons and you think it is worth it).