Hey! Sorry the site was down so long; I accidentally let the payment lapse. It's back up and should stay that way, if you'd still like to use it. I also added a page where you can toggle between the predictions.
But no worries if you're happy with your own. :)
EDIT: I just realized you only posted this 2 days ago! I didn't see your version before fixing the site; I had actually just logged in to update anyone who cared. :P
In the same way, "AI Alignment" excludes e.g. people who are inclined to believe superintelligences will know better than us what is good, and who don't want to hamstring them. You can think we're well rid of these people. But you're still excluding people and thereby reducing the amount of thinking that will be applied to the problem.
I'm not sure what can someone who essentially thinks there is no problem contribute to its solution. That said, I get the gist of the argument and you do have a point IMO about stressing the two complementary aspects of a mind. Maybe Artificial Volition? Intention feels to me like it alliterates so much with Intelligence it circles back from catchiness to being confusing.
Anyone who is confident no ufos are truly anomalous, please feel free to extend me odds for a bet here https://www.lesswrong.com/posts/t5W87hQF5gKyTofQB/ufo-betting-put-up-or-shut-up
I have already paid out to two betters so far, and would like some more
Running water doesn't create the conditions to permanently disempower almost everyone, AGI does. What I'm talking about isn't a situation in which initially only the rich benefit but then the tech gets cheaper and trickles down. It's a permanent trap that destroys democracy and capitalism as we know them.
Consider two claims:
These two claims should probably not both be true! If any system can be modeled as maximizing a utility function, and it is possible to build a corrigible system, then naively the corrigible system can be modeled as maximizing a utility function.
I expect that many peoples' intuitive mental models around utility maximization boil down to "boo utility maximizer models", and they would therefore intuitively expect both the above claims to be true at first glance. But on examination, the probable-incompatibility is fairly obvious, so the two claims might make a useful test to notice when one is relying on yay/boo reasoning about utilities in an incoherent way.
Also perhaps of interest might be this discussion from the SSC subreddit awhile back where someone detailed their pro-Bigfoot case.
Sometimes I have an internal desire different to do something different than what I think should be done (for example, I might desire to play a game while also thinking the better choice is to read). I've been experimenting with using randomness to mediate this. I keep a D20 with me, give each side of the dispute some odds proportional to the strength of its resolve, and then roll the die.
In theory, this means neither side will overpower the other, and even a small resolve still has a chance. I'm not sure how useful this is, but it's fun, and can sort of give me motivation (I've tried to internalize this kind of roll as a rule not to break without good reason).
Also, when I'm merely deciding between some options, sometimes I'll roll more casually with equal odds, and it'll help me realize that I already know which it is I really wanted to do (if I don't like the roll's outcome).
How in-depth have you looked at the studies about declining performance in doctors with age? An obvious alternative hypothesis is that doctors gain skill as they age, and therefore tend to take on higher-risk patients and procedures with worse outcomes. I am not saying that's what's going on here - I'd just like to know if this is something you've looked into.
If by 'very unlikely' you think the likelihood is <1% you can get nearly free money by betting against: https://www.lesswrong.com/posts/t5W87hQF5gKyTofQB/ufo-betting-put-up-or-shut-up
I think the user is still willing to send out a few thousand dollars.
Epistemic learned helplessness: Idk man, do we even need a theory of impact? In what world is 'actually understanding how our black box systems work' not helpful?
The real question is not whether (mechanistic) interpretability is helpful, but whether it could also be "harmful", i.e., speed up capabilities without delivering commensurate or higher improvements in safety (Quintin Pope also talks about this risk in this comment), or by creating a "foom overhang" as described in "AGI-Automated Interpretability is Suicide". Good interpretability also creates an infosec/infohazard attack vector, as I described here.
Thus, the "theory of impact" for interoperability should not just list the potential benefits of it, but also explain why these benefits are expected to outweigh potential harms, timeline shortening, and new risks.
It seems that the "ethical simulator" from point 1. and the LLM-based agent from point 2. overlap, so you just overcomplicate things if make them two distinct systems. An LLM prompted with the right "system prompt" (virtue ethics) + doing some branching-tree search for optimal plans according to some trained "utility/value" evaluator (consequentialism) + filtering out plans which have actions that are always prohibited (law, deontology). The second component is the closest to what you described as an "ethical simulator", but is not quite it: the "utility/value" evaluator cannot say whether an action or a plan is ethical or not in absolute terms, it can only compare some proposed plans for the particular situation by some planner.
How do the desires of possible executors/heirs/etc. factor into this?
Clearly the bet will not auto-extinguish and auto-erase itself regardless of the future desires of anyone.
If you thought I implied that the bet must be settled in purely monetary terms, that wasn't my intention. It's entirely possible for the majority, or entirety, of the bet to be settled with non-monetary currencies, such as social-status, reputation, etc...
It's just not all that likely for someone, or their successors, to insist on going down that path.
""AI alignment" has the application, the agenda, less charitably the activism, right in the name."
This seems like a feature, not a bug. "AI alignment" is not a neutral idea. We're not just researching how these models behave or how minds might be built neutrally out of pure scientific curiosity. It has a specific purpose in mind - to align AI's. Why would we not want this agenda to be part of the name?
I have found a lot of online summaries of deliberate practice frustratingly vague. So I bought a well reviewed out of print manual on deliberate practice in music called The Practiceopedia. The chapter headings give some ideas about the sort of resolution being gone for. I might do a book review at some point.
Chapter guide
Beginners: curing your addiction to the start of your peace
Blinkers: shutting out the things you shouldn't be working on
Boot camp: where you need to send passages that won't behave
Breakthroughs diary: keeping track of your progress
Bridging: smoothing the bumps between sections
Bug spotting: because you can't fix what you don't know about
Campaigns: connecting your daily practice to the big picture
Cementing: locking in the version you want to keep
Chaining: getting to full speed one segment at a time
Clearing obstacles: finding what causes tricky bits to be tricky
Clock Watchers: curing the unhealthy obsession with time
Closure: knowing when you can safely stop practicing something
Color coding: a whole new dimension to marking your score
Coral reef mistakes: detecting invisible trouble spots
Cosmetics: minimizing the impact of weak capacities on concert day
Countdown charts: factoring your deadlines into your practice
Designer scales: choosing technical work to support your pieces
Details trawl: ensuring you know what's really in the score
Dress rehearsals: setting up your own concert simulator
Engaging autopilot: the dangers of practicing without thinking
Exaggerating: overstating key ideas to embed them
Excuses and ruses: why you'll never really fool your teacher if you haven't practiced
Experimenting: testing different interpretation options
Fire drills: training to cope gracefully with onstage mistakes
Fitness training: behind the scenes practice to help all your pieces
Fresh photocopies: creating your own custom scores tailored for practicing
Horizontal versus vertical: knowing when to change your practice Direction
Isolating: stopping problems from interfering with each other
Lesson agenda: setting aside issues to raise at your next lesson
Lesson pre-flight check: finding out if you're on track for next lesson
Lesson review: ensuring last lesson is fresh in your mind while you work
Level system: the astonishing power of Tiny Steps
Marathon week: pushing yourself to find out what's really possible
Metronome method: sneaking up on full tempo
Not wanting to practice: how to manage the biggest practice crisis of all
One Way doors: eliminating the need for constant revision
Openings and endings: VIP attention for the most important parts of any performance
Painting the scene: giving your performances a cinematic Edge
Practice Buddies: using the power of competition and cooperation
Practice traps: bad habits that waste your time and wreck your playing
Pressure testing: ensuring you can produce your best playing when it counts most
Randomizing: the ultimate way to end the practice ho hums
Prototypes: building a model of the ideal performance
Recording yourself: finding out and responding to what you really sound like
Recordings: using existing performances to supercharge your preparation
Reflecting: why the best practice sometimes makes no sound at all
Restoration: relearning old pieces without regressing
Rogue cells: when the smallest unit of practice goes bad
Scouting: getting to know your new piece before you start practicing it
Session agenda: creating and working with daily practice to do lists
Shooting the movies: a smarter way to work out what's next
Speeding: the hidden damage caused by practicing too fast
Stalling: what to do when a piece gets stuck
Thematic practice: a powerful alternative to practicing in sections
Tightening: making the leap from good enough to excellent
Triage: when there's too much to do and not enough time to do it
Triggers: setting up cues that get you practicing in the first place
Turn around time: mastering new pieces in weeks instead of months
Varying your diet: freeing yourself from dull repetitive practice
Visualizing: the most important practice you'll ever do
Your practice sweet spot: setting up the ultimate practice space
A quick prefatory note on how I'm thinking about 'goals' (I don't think it's relevant, but I'm not sure): as I'm modelling things, Sia's desires/goals are given by a function from ways the world could be (colloquially, 'worlds') to real numbers, , with the interpretation that is how well satisfied Sia's desires are if turns out to be the way the world actually is. By 'the world', I mean to include all of history, from the beginning to the end of time, and I mean to encompass every region of space. I assume that this function can be well-defined even for worlds in which Sia never existed or dies quickly. Humans can want to never have been born, and they can want to die. So I'm assuming that Sia can also have those kinds of desires, in principle. So her goal can be achieved even if she's not around.
When I talk about 'goal preservation', I was talking about Sia not wanting to change her desires. I think you're right that that's different from Sia wanting to retain her desires. If she dies, then she hasn't retained her desires, but neither has she changed them. The effect I found was that Sia is somewhat more likely to not want her desires changed.
"Artificial Intention" doesn't sound catchy at all to me, but that's just my opinion.
Personally, I prefer to think of the "Alignment Problem" more generally rather than "AI Alignment". Regardless of who has the most power (humans, AI, cyborgs, aliens, etc.) and who has superior ethics, conflict arises when participants in a system are not all aligned.
here's a small improvement for me. i open a lot of tabs every day, sometimes to read them later, etc. it would get really disorganized, till i enabled a setting that makes new tabs open to the right of the current one, rather than to the right of all of them. it still gets disorganized, but not as much. also, now i don't need to scroll all the way to the right on my tab list to get to one i just opened, and can just ctrl + click -> ctrl + tab.
(there may be a better solution for this, like a tab manager addon, though)
The Wrights invented the airplane using an empirical, trial-and-error approach. They had to learn from experience. They couldn’t have solved the control problem without actually building and testing a plane. There was no theory sufficient to guide them, and what theory did exist was often wrong. (In fact, the Wrights had to throw out the published tables of aerodynamic data, and make their own measurements, for which they designed and built their own wind tunnel.)
This part in particular is where I think there's a whole bunch of useful lessons for alignment to draw from the Wright brothers.
First things first: "They couldn’t have solved the control problem without actually building and testing a plane" is... kinda technically true, but misleading. What makes the Wright brothers such an interesting case study is that they had to solve the large majority of the problem (i.e. "get the large majority of the bits of optimization/information") without building an airplane, precisely because it was very dangerous to test a plane without the ability to control it. Furthermore, they had to do it without reliable theory. And the Wright brothers are an excellent real-world case study in creating a successful design mostly without relying on either robust theory or trial-and-error on the airplane itself.
Instead of just iterating on an airplane, the Wright brothers relied on all sorts of models. They built kites. They studied birds. They built a wind tunnel. They tested pieces in isolation - e.g. collecting their own aerodynamic data. All that allowed them to figure out how to control an airplane, while needing relatively-few dangerous attempts to directly control the airplane. That's where there's lots of potentially-useful analogies to mine for AI. What would be the equivalent of a wind tunnel, for AI control? Or the equivalent of a kite? How did the Wright brothers get their bits of information other than direct tests of airplanes, and what would analogies of those methods look like?
Wouldn't this imply a bias towards eliminating other agents? (Since that would make the world more predictable, and thereby leave less up to chance?)
A few things to note. Firstly, when I say that there's a 'bias' towards a certain kind of choice, I just mean that the probability that a superintelligent agent with randomly sampled desires (Sia) would make that choice is greater than 1/N, where N is the number of choices available. So, just to emphasize the scale of the effect: even if you were right about that inference, you should still assign very low probability to Sia taking steps to eliminate other agents.
Secondly, when I say that a choice "leaves less up to chance", I just mean that the sum total of history is more predictable, given that choice, than the sum total of history is predictable, given other choices. (I mention this just because you didn't read the post, and I want to make sure we're not talking past each other.)
Thirdly, I would caution against the inference: without humans, things are more predictable; therefore, undertaking to eliminate other agents leaves less up to chance. Even if things are predictable after humans are eliminated, and even if Sia can cook up a foolproof contingency plan for eliminating all humans, that doesn't mean that that contingency plan leaves less up to chance. Insofar as the contingency plan is sensitive to the human response at various stages, and insofar as that human response is unpredictable (or less predictable than humans are when you don't try to kill them all), this bias wouldn't lend any additional probability to Sia choosing that contingency plan.
Fourthly, this bias interacts with the others. Futures without humanity might be futures which involve fewer choices---other deliberative agents tend to force more decisions. So contingency plans which involve human extinction may involve comparatively fewer choicepoints than contingency plans which keep humans around. Insofar as Sia is biased towards contingency plans with more choicepoints, that's a reason to think she's biased against eliminating other agents. I don't have any sense of how these biases interact, or which one is going to be larger in real-world decisions.
Wouldn't this strongly imply biases towards both self-preservation and resource acquisition?
In some decisions, it may. But I think here, too, we need to tread with caution. In many decisions, this bias makes it somewhat more likely that Sia will pursue self-destruction. To quote myself:
Sia is biased towards choices which allow for more choices---but this isn't the same thing as being biased towards choices which guarantee more choices. Consider a resolute Sia who is equally likely to choose any contingency plan, and consider the following sequential decision. At stage 1, Sia can either take a 'safe' option which will certainly keep her alive or she can play Russian roulette, which has a 1-in-6 probability of killing her. If she takes the 'safe' option, the game ends. If she plays Russian roulette and survives, then she'll once again be given a choice to either take a 'safe' option of definitely staying alive or else play Russian roulette. And so on. Whenever she survives a game of Russian roulette, she's again given the same choice. All else equal, if her desires are sampled normally, a resolute Sia will be much more likely to play Russian roulette at stage 1 than she will be to take the 'safe' option.
See the post to understand what I mean by "resolute"---and note that the qualitative effect doesn't depend upon whether Sia is a resolute chooser.
Arguments that might actually address the cruxes of someone in this reference class might include: [...]
The distribution of outcomes from government interventions are so likely to give you less time, or otherwise make it more difficult to solve the technical alignment problem, that there are fewer surviving worlds where the government intervenes as a result of you asking them to, compared to the counterfactual.
The thing I care more about is quality-adjusted effort, rather than time to solve alignment. For example, I'd generally prefer 30 years to solve alignment with 10 million researchers to 3000 years with 10 researchers, all else being equal. Quality of alignment research comes from a few factors:
I expect early delays to lead to negligible additional alignment progress during the delay, relative to future efforts. For example, halting semiconductor production in 2003 for a year to delay AI would have given us almost no additional meaningful alignment progress. I think the same is likely true for 2013 and even 2018. The main impact would just be to delay everything by a year.
In the future I expect to become more optimistic about the merits of delaying AI, but right now I'm not so sure. I think some types of delays might be productive, such as delaying deployment by requiring safety evaluations. But I'm concerned about other types of delays that don't really give us any meaningful additional quality-adjusted effort.
In particular, the open letter asking for an AI pause appeared to advocate what I consider the worst type of delay: a delay on starting the training of giant models. This type of delay seems least valuable to me for two main reasons.
The first reason is that it wouldn't significantly slow down algorithmic progress, meaning that after the pause ended, people could likely just go back to training giant models almost like nothing happened. In fact, if people anticipate the pause ending, then they're likely to invest heavily and then start their training runs on the date the pause ends, which could lead to a significant compute overhang, and thus sudden progress. The second reason is that, compared to a delay of AI deployment, delaying the start of a training run reduces the quality-adjusted effort that AI safety researchers have, as a result of preventing them from testing alignment ideas on more capable models.
If you think that there are non-negligible costs to delaying AI from government action for any reason, then I think it makes sense to be careful about how and when you delay AI, since early and poorly targeted delays may provide negligible benefits. However, I agree that this consideration becomes increasingly less important over time.
I'm confused about the disagree votes. Can someone who disagree-voted say which of the following claims they disagreed with:
1. Omega criticized the lack of a senior technical expert on Conjecture's team.
2. Omega's primary criticisms of Connor doesn't have to do with his leadership skills.
3. Omega did not comment on Connorship's leadership skills at any point in the post.
In my mind (which might be wrong; I'm not particularly knowledgable and have not thought about this deeply) the big issue with ads is the attention economy stuff that Cal Newport talks about. Monetizing (primarily) via ads means that you are competing for eyeballs, which is bad when things like outrage and jealousy prove to attract the most eyeballs instead of things like knowledge and empathy. Well I guess that doesn't automatically make it bad. It's just that it's an undesirable consequence.
Spoiler: Less than 1% will admit they were wrong. Straight denial, reasoning that it doesn't actually matter, or pretending they knew the whole time lab origin was possible are all preferable alternatives. Admitting you were wrong is career suicide.
The political investments in natural origin are strong. Trump claiming a Chinese lab was responsible automatically put a large chunk of Americans in the opposite camp. My interest in the topic actually started with reading up to confirm why he was wrong, only to find the Daszak-orchestrated Lancet letter that miscited numerous articles and the Proximal Origins paper that might be one of the dumbest things I've ever read. The Lancet letter's declaration that "lab origin theories = racist" influenced discourse in a way that cannot be understated. It also seems many view more deadly viruses as an adjoining component of climate change: a notion that civilizing more square footage of earth means we are inevitably bound to suffer nature's increasing wrath in the form of increasingly virulent, deadly pathogens.
The professional motivations are stark and gross. “It is difficult to get a man to understand something, when his salary depends on his not understanding it.” Thoughts on the origin are frequently dismissed if you're not a virologist. But all the money in virology is in gain of function. Oops!
(cross-posted from EAF, thanks Richard for suggesting. There's more back-and-forth later.)
I'm not very compelled by this response.
It seems to me you have two points on the content of this critique. The first point:
I think it's bad to criticize labs that do hits-based research approaches for their early output (I also think this applies to your critique of Redwood) because the entire point is that you don't find a lot until you hit.
I'm pretty confused here. How exactly do you propose that funding decisions get made? If some random person says they are pursuing a hits-based approach to research, should EA funders be obligated to fund them?
Presumably you would want to say "the team will be good at hits-based research such that we can expect a future hit, for X, Y and Z reasons". I think you should actually say those X, Y and Z reasons so that the authors of the critique can engage with them; I assume that the authors are implicitly endorsing a claim like "there aren't any particularly strong reasons to expect Conjecture to do more impactful work in the future".
The second point:
Your statements about the VCs seem unjustified to me. How do you know they are not aligned? [...] I haven't talked to the VCs either, but I've at least asked people who work(ed) at Conjecture.
Hmm, it seems extremely reasonable to me to take as a baseline prior that the VCs are profit-motivated, and the authors explicitly say
We have heard credible complaints of this from their interactions with funders. One experienced technical AI safety researcher recalled Connor saying that he will tell investors that they are very interested in making products, whereas the predominant focus of the company is on AI safety.
The fact that people who work(ed) at Conjecture say otherwise means that (probably) someone is wrong, but I don't see a strong reason to believe that it's the OP who is wrong.
At the meta level you say:
I do not understand where the confidence with which you write the post (or at least how I read it) comes from.
And in your next comment:
I think we should really make sure that we say true things when we criticize people, quantify our uncertainty, differentiate between facts and feelings and do not throw our epistemics out of the window in the process.
But afaict, the only point where you actually disagree with a claim made in the OP (excluding recommendations) is in your assessment of VCs? (And in that case I feel very uncompelled by your argument.)
In what way has the OP failed to say true things? Where should they have had more uncertainty? What things did they present as facts which were actually feelings? What claim have they been confident about that they shouldn't have been confident about?
(Perhaps you mean to say that the recommendations are overconfident. There I think I just disagree with you about the bar for evidence for making recommendations, including ones as strong as "alignment researchers shouldn't work at organization X". I've given recommendations like this to individual people who asked me for a recommendation in the past, on less evidence than collected in this post.)
I tried to formalize this, using as a "poor man's counterfactual", standing in for "if Alice cooperates then so does Bob". This has the odd behaviour of becoming "true" when Alice defects! You can see this as the counterfactual collapsing and becoming inconsistent when its premise is violated. But this does mean we need to be careful about using these.
For technical reasons we upgrade to , which says "if Alice cooperates in a legible way, then Bob cooperates back". Alice tries to prove this, and legibly cooperates if so. Giving us "Alice legibly cooperates if she can prove that, if she legibly cooperates, Bob would cooperate back". In symbols, .
Is this okay? What about proving ?
Well, actually you can't ever prove that! Because of Lob's theorem.
Outside the system we can definitely see cases where is unprovable, e.g. because Bob always defects. But you can't prove this inside the system. You can only prove things like "" for finite proof lengths .
I think this is best seen as a consequence of "with finite proof strength you can only deny proofs up to a limited size".
So the construction in math slips by, perhaps because two different weirdnesses are canceling each other out. But in any case I think the underlying idea is pretty trustworthy, and perhaps deserves to be cached out in better provability math.
If you're saying you can't understand why Libertarians think centralization is bad, that IS a crux and trying to understand it would be a potentially useful exercise.
I am not saying that. Many libertarians think that centralization of power often has bad effects. But trying to argue with libertarians who are advocating for government regulations because they're worried about AI x-risk by pointing out that government regulation will increase centralization of power w.r.t. AI is a non-sequitur, unless you do a lot more work to demonstrate how the increased centralization of power acts contrariwise the libertarian's goals in this case.
Sorry for any confusion. Meta only tested LIMA on their 30 safety prompts, not the other LLMs.
Figure 1 does not show the results from the 30 safety prompts, but instead the results of human evaluations on the 300 test prompts.
‘Dimension hopping’ or ‘dimension manipulation’ could be a solution to the Fermi paradox. The universe could be full of intelligent life that remain silent and (mostly) invisible behind advanced spatial technology.
(the second type refers to more limited hypothetical dimension technology such as creating pocket dimensions, for example, rather than accessing other universes)
Your argument with Alexandros was what inspired this post, actually. I was thinking about whether or not to send this to you directly... guess that wasn't necessary.
I actually agree! As I wrote in my post, "GPT is not an agent, [but] it can “play one on TV” if asked to do so in its prompt." So yes, you wouldn't need a lot of scaffolding to adapt a goal-less pretrained model (what I call an "intelligence forklift") into an agent that does very sophisticated things.
However, this separation into two components - the super-intelligent but goal-less "brain", and the simple "will" that turns it into an agent can have safety implications. For starters, as long as you didn't add any scaffolding, you are still OK. So during most of the time you spend training, you are not worrying about the system itself developing goals. (Though you could still worry about hackers.) Once you start adapting it, then you need to start worrying about this.
The other thing is that, as I wrote there, it does change some of the safety picture. The traditional view of a super-intelligent AI is of the "brains and agency" tightly coupled together, just like they are in a human. For example, a human is super-good at finding vulnerabilities and breaking into systems, they have the capability to also help fix systems, but I can't just take their brain and fine-tune it on this task. I have to convince them to do it.
However, things change if we don't think of the agent's "brain" as belonging to them, but rather as some resource that they are using. (Just like if I use a forklift to lift something heavy.) In particular it means that capabilities and intentions might not be tightly coupled - there could be agents using capabilities to do very bad things, but the same capabilities could be used by other agents to do good things.
Someone just told me that the solution to conflicting experiments is more experiments. Taken literally this is wrong: more experiments just means more conflict. What we need are fewer experiments. We need to get rid of the bad experiments.
Why expect that future experiments will be better? Maybe if the experimenters read the past experiments, they could learn from them. Well, maybe, but maybe if you read the experiments today, you could figure out which ones are bad today. If you don't read the experiments today and don't bother to judge which ones are better, what incentive is there for future experimenters to make better experiments, rather than accumulating conflict?
Hm. I'm going to take a step back, away from the math, and see if that makes things less confusing.
Let's go back to Alice thinking about whether to cooperate with Bob. They both have perfect models of each other (perhaps in the form of source code).
When Alice goes to think about what Bob will do, maybe she sees that Bob's decision depends on what he thinks Alice will do.
At this junction, I don't want Alice to "recurse", falling down the rabbit hole of "Alice thinking about Bob thinking about Alice thinking about--" and etc.
Instead Alice should realize that she has a choice to make, about who she cooperates with, which will determine the answers Bob finds when thinking about her.
This manouvre is doing a kind of causal surgery / counterfactual-taking. It cuts the loop by identifying "what Bob thinks about Alice" as a node under Alice's control. This is the heart of it, and imo doesn't rely on anything weird or unusual.
The general point that you need to update on the evidence that failed to materialize is in the sequences and is exactly where I expected you to go based on your introductory section.
Why do you think AGI would have a very different architecture from what humans do? I'd expect a lot of similarities, just with different hardware.
At the moment at least, progress on reliability is very slow compared to what we would want. To get a sense of what I mean, consider the case of randomized algorithms. If you have an algorithm that for every input computes some function with probability at least 2/3 (i.e. ) then if we spend times more the computation, we can do majority voting and using standard bounds show that the probability of error drops exponentially with (i.e. or something like that where is the algorithm obtained by scaling up to compute it times and output the plurality value).
This is not something special to randomized algorithms. This also holds in the context of noisy communication and error correcting codes, and many other settings. Often we can get to success at a price of , which is why we can get things like "five nines reliability" in several engineering fields.
In contrast, so far all our scaling laws show that when we scale our neural networks by spending a factor of more computation, we only get a reduction in the error that looks like so it's polynomial rather than exponential, and even the exponent of the polynomial is not that great (and in particular smaller than one).
So while I agree that scaling up will yield progress on reliability as well, at least with our current methods, it seems that we would do things that are 10 or 100 times more impressive than what we do now, before we get to the type of 99.9% and better reliability on the things that we currently do. Getting to do something that is both super-human in capability as well as has such a tiny probability of failure that it would not be detected seems much further off.
GPT-2 1.5B 15B 2.5794
Where does the "15B" for GPT-2's data come from, here? Epoch's dataset's guess is that it was trained on 3B tokens for 100 epochs: https://docs.google.com/spreadsheets/d/1AAIebjNsnJj_uKALHbXNfn3_YsT6sHXtCU0q7OIPuc4/edit#gid=0
The big difference between AI and these technologies is that we're worried about adversarial behavior by the AI.
A more direct analogy would be if Wright & co had been worried that airplanes might "decide" to fly safely until humanity had invented jet engines, then "decide" to crash them all at once. Nuclear bombs do have a direct analogy - a Dr. Strangelove-type scenario in which, after developing an armamentarium in a ostensibly carefully-controlled manner, some madman (or a defect in an automated launch system) triggers an all-out nuclear attack and ends the world.
This is the difficulty, I think. Tech developers naturally want to think in terms of a non-adversarial relationship with their technology. Maybe this is more familiar to biologists like myself than to people working in computer science. We're often working with living things that can multiply, mutate and spread, and which we know don't have our best interests in mind. If we achieve AGI, it will be a living in silico organism, and we don't have a good ability to predict what it's capable of because it will be unprecedented on the earth.
I don't think drift would necessarily be the same for humans and a wildly different intelligence architecture, but it's an interesting way to think about it.
I think maybe you misunderstand the word "crux". Crux is a point where you and another person disagree. If you're saying you can't understand why Libertarians think centralization is bad, that IS a crux and trying to understand it would be a potentially useful exercise.
A slightly surreal experience to read a post saying something I was just tweeting about, written by a username that could plausibly be mine.
I think worlds with the tools to treat most causes of human death ranks strictly higher than a world without those tools. In the same way that a world with running water ranks above worlds without it. Even today not everyone benefits from running water. If you could go back in time would you campaign against developing pipes and pumps because you believed only the rich would ever have running water? (Which was true for a period of time)
This is quite a story.
I don't think my odds of lab origin are 99% yet, but I think after this article I'd move my odds from 80%->90%. I'd like to see confirmation by more sources before I move any higher. But the evidence looks pretty compelling with this point; the narrative is coherent, the counterarguments (of those I've read) seem weak. Though it's possible I've missed some stronger ones since most of the people in my information sphere seem to believe the lab leak hypothesis.
Re:
Meta’s commitment to open source
Note that only one of Meta's recent releases have been open-source.
I'm an open-source maintainer myself, though not an absolutist or convinced that eg Llama should have been open-sourced. I do however find it pretty frustrating when these models are incorrectly described as open source (including by Yann LeCun, who ought to know better). As is, we collectively get many of the research benefits, all the misuse, but very little of the commercial innovation or neat product improvements that open-sourcing would bring.
I half-agree with both of you. I do think Hanson's selection pressure paper is a useful first approximation, but it's not clear that the reachable universe is big enough that small deviations from the optimal strategy will actually lead to big differences in amount of resources controlled. And as I gestured towards in the final section of the story, "helping" can be very cheap, if it just involves storing their mind until you've finished expanding.
But I don't think that the example of animals demonstrates this point very well, for two reasons. Firstly, in the long term we'll be optimizing these probes way harder than animals were optimized.
Secondly, a lot of the weird behaviors of animals are a result of needing to compete directly against each other (e.g. by eating each other, or mating with each other). But I'm picturing almost all competition between probes happening indirectly, via racing to the stars. So I think they'll look more directly optimized for speed. (For example, an altruistic probe in direct competition would others would need ways of figuring out when its altruism was being exploited, and then others would try to figure out how to fool it, until the whole system became very unwieldy. By contrast, if the altruism just consists of "in colonizing a solar system I'll take a 1% efficiency hit by only creating non-conscious workers" then that's much more direct.)
Sayash Karpoor and Arvind Narayanan say licensing of models wouldn’t work because it is unenforceable, and also it would stifle competition and worsen AI risks. I notice those two claims tend to correlate a lot, despite it being really very hard for both of them to be true at once – either you are stifling the competition or you’re not, although there is a possible failure mode where you harm other efforts but not enough. The claimed ‘concentration risks’ and ‘security vulnerabilities’ do not engage with the logic behind the relevant extinction risks.
Making it harder to legally use models accomplishes two things:
Consider the situation with opiates in the US: our attempts to erect legal barriers for people obtaining opiates has indeed reduced the number of people legally obtaining opiates, and probably even reduced total opiate consumption, but at the cost that a lot of people were driven to buy their opiates illegally instead of going through medical channels.
I don't expect computing power sufficient to train powerful models to be easier to control than opiates, in worlds where doom looks like rapid algorithmic advancements that decrease the resource requirements to train and run powerful models by orders of magnitude.
The question is not whether I can pass their ITT: that particular claim doesn't obviously engage with any cruxes that I or others like me to have, related to x-risk. That's the only thing that section is describing.
I make no claim to speak for anyone who isn't me, but I agree with your analysis. I would say similar things about e.g. ESP and miracles and the like.
I'd like to register that I disagree with the claim that standard online RLHF requires adversarial robustness in AIs persay. (I agree that it requires that humans are adversarially robust to the AI, but this is a pretty different problem.)
In particular, the place where adversarial robustness shows up is in sample efficiency. So, poor adversarial robustness is equivalent to poor sample efficiency. My understanding is that the trend is toward higher not lower sample efficiency with scale, so this seems on track.
This same reasoning also applies to recursive oversight schemes or debate.
Overall, my guess is that commercial incentives and normal scaling will cause the sample efficiencies of these processes to be quite high in practice. Additionally, my guess would be that improving adversarial robustness isn't the best way to push on sample efficiency.
That said, I do think there are many important reasons to improve adversarial robustness.
The biggest problem with the argument is that, given our current knowledge about the specific details of extraterrestrial civilizations, the term 'aliens' in P[aliens] does not fulfill the hard-to-vary criteria of a good explanation.
Skeptic: "If it's aliens, why haven't they been trying to contact us"
Post-hoc variation: "Because of the Prime Directive"
Skeptic: "If it's a physical vehicle, why does it not obey the laws of physics"
Post-hoc variation: "Because the aliens have discovered new physics which we don't know about".
etc.. etc..
Any unexplained phenomena, terrestrial or otherwise, can be explained by 'aliens', and for any skeptical counter-argument, the specifics of 'aliens' can be varied to fit the facts.
Therefore, as things stand with our current knowledge, 'aliens' is simply not a good explanation, regardless of the prior and posterior probabilities.
My hand wavy view is that 'consciousness' which causes collapse is a very small (collapse resistant as Chalmers wrote) object inside the brain. For example, it is an electric potential of membrane of a single neuron. As a result, everything outside it - the whole universe - is in some sense the Schrödinger cat.
The whole 'macroscopic quantum effects' are interferences between whole universes branches from the view of this small quantum object in they brain. It could be rephrased as small quantum object in the brain is itself in complex quantum states which may sound more plausibly.
Because the interference is happening between whole branches, photon-cause decoherence of some objects inside each branch is not relevant.
This is why Everett called his theory relative interpretation of QM: there is a relation (multiplication of vectors states) between two systems, observer and the universe. Note that later "many worlds interpretation" is oversimplification of this idea as it excludes interference between branches.
For what it's worth, I think most people I know expect most professed values to be violated most of the time, and so they think that libertarians advocating for this is perfectly ordinary; the surprising thing would be if professed libertarians weren't constantly showing up advocating for regulating things. Show don't tell in politics and ideology. That's not to say professing values is useless, just that there's not an inconsistency to be explained here, and if I link people in my circles this post, they'd respond with an eyeroll at the possibility that if only they were more libertarian they'd be honest - because the name is most associated with people using the name to lie.
Was "avoiding anti-Chinese sentiment" really a motivation? The official explanation is that some Chinese person ate like a barbecued bat or got bit by a pangolin or something. I don't see how a lab leak would make people any more racist or hateful towards the Chinese than the official explanation did.
I suppose that it probably was a motivation even though it did not make much rational sense to me. I just wonder if that concern was more of a matter of political identity rather than a considered response.
As an OpenAI employee I cannot say too much about short-term expectations for GPT, but I generally agree with most of his subpoints; e.g., running many copies, speeding up with additional compute, having way better capabilities than today, have more modalities than today. All of that sounds reasonable. The leap for me is (a) believing that results in transformative AGI and (b) figuring out how to get these things to learn (efficiently) from experience. So in the end I find myself pretty unmoved by his article (which is high quality, to be sure).
Because they’re under different selection pressures. Take a look at this paper by Robin Hanson: https://mason.gmu.edu/~rhanson/filluniv.pdf . When colonizing unowned space, victory goes to the swiftest, not to the cleverest, most beautiful, or most strategically lazy.
IIRC, this grew out of discussions in which I raised the problem of optimal interstellar colonization strategies. Robin thought about the problem and, with the methods of an economist, settled it decisively. Now this strategy is just part of the background knowledge that the author of this story assumed.
I love stories like this. It's not immediately obvious to me how to translate them to AI—like what is the equivalent of what the Wright brother's did for AI?—but I think hearing these are helpful to developing the mindset that will create the kinds of precautions necessary to work with AI safely.
Really interesting idea.
We could check already existing data from e.g. parapsychology for this effect. As I remember, it was observed there that the stronger is control in the experiments, the less is co-called psi-effect which usually was interpreted as evidence against psi.
But suspect that that meta-anti-epistemic nature of the phenomena will appear even in such setup and it will produce initially promising but then declining results.
Yes, in particular, more compute means it's easier to automate searches for algorithmic improvements....
Sabina Hossenfelder argues against the idea decoherence is measurement e.g. here: http://backreaction.blogspot.com/2019/10/what-is-quantum-measurement-problem.html As I understand, the main difference form her view is that decoherence is the relation between objects in the system, but measurement is related to the whole system "collapse".
What she's mainly arguing there is that decoherence does not solve the measurement problem because it does not result in the Born rule without further assumptions. She also links another post where she argues that attempts to derive the Born rule via rational choice theory are non-reductionist.
It might be that she thinks that means that some separate collapse is likely in addition to the separation into a mixture via decoherence, where the collapse selects a particular outcome from the mixture, but even if that were true, such a collapse would, I think, have to occur after or simultaneously with decoherence or it would be observable.
None of this leads, as far as I can tell, to the strange expectations that you seem to have.
For the record, our relationship to supporting events for this ecosystem is changing from something like "all of our resources are the same, here have my venue for free if you need it" to "markets and pricing are a great way for large masses of people to coordinate on the value of a good or service, let's coordinate substantially via trade".
For instance, during a previous cohort of SERI MATS scholars at the Lightcone Offices, I spent a couple of weeks of work adding a second floor and getting it furnished and doing interior design, hiring another support person to the office team, and then later on dealing with closing it down and downsizing when the demand went away. I did all of that for free, and was not paid salary or anything by MATS, it was part of my Lightcone job, because we wanted to support mentorship happening in the AI alignment ecosystem. It's different this time around. They're paying us a substantial amount of money (well over $100k) for the use of 2.5 of our nicely furnished and designed buildings for 2 months, an amount that makes the trade pretty good for Lightcone (and I hope+expect to work hard and make it worthwhile for SERI MATS too!). The other workshops Habryka has mentioned elsethread will also mostly be paying trade partners (general pricing TBD as we get a better sense of the demand).
I bring this up because the extent to which funds for Lightcone are spent supporting SERI MATS in particular (and other teams/orgs/events) is (I suspect) much less than you are thinking.
Scott Alexander post that seems very relevant to your example: The Control Group Is Out Of Control. It puts into question even the heuristic of "Is there much more evidence for [blah] than...".
"If alien aircraft were on Earth, they would need to be carefully calibrated to give us grainy distant glimpses (in every possible way) but never more. If alien aircraft are here, they’re screwing with us."
I don't know - this inference seems rather weak. I try to be on time when I make an appointment, most of my failures are pretty small - observing me being late a minute or two a couple of times does not mean that I calibrate being late to a minute or two. Failures would naturally cluster near the border line.
Plus they might be actively erasing evidence when it is too obvious.
Thanks! I once wrote up a somewhat-parallel discussion on a different topic in Section 5.1 here:
… So this is the “null hypothesis” of what to expect if there’s no such thing as [blah]. By now there are probably ≈1000 person-years of experimental data created by [blah] researchers. In such a huge mountain of data, there is bound to be lots of “random noise and ad hoc misinterpretations” that happen to line up remarkably with researchers’ prior expectations about [blah]. The question is not “Are there results that seems to provide evidence for [blah]?”, but rather “Is there much more evidence for [blah] than could plausibly be filtered out of 1000 person-years of random noise, misinterpretations, experimental errors, bias, occasional fraud, gross incompetence, weird equipment malfunctions, etc.?” …
and I also linked to & excerpted yet another parallel discussion on yet a different topic by Scott Alexander, Section 17 here.
I agree with all of that, but the way you described that interaction sounds like it wouldn't even come close to accomplishing these goals. There's a gap in communication. I'd have to see you do it in person to know if I thought it was working.
Yeah, I thought to note that in the comment that starts this thread; that's not the kind of thing that seems practical when coordinating updating in an informal way. So more carefully, the intended scope of the comment is formal updating (computing of credences) that's directed informally (choosing the potential observations and hypotheses to pay attention to).
it only works when you are able to reduce social anxiety by showing that they're welcome. someone who is cripplingly anxious typically wants to feel like they're safe, so showing them a clearer map to safety includes detecting the structure of their social anxiety first and getting in sync with it. then you can show them they're welcome in a way that makes them feel safer, not less. to do this requires gently querying their anxiety's agentic target and inviting the group to behave in ways that satisfy what their brain's overactivation wants.
Yep, and I recognize that later in the article:
The paperclip maximizer problem that we discussed earlier was actually initially proposed not as an outer alignment problem of the kind that I presented (although it is also a problem of choosing the correct objective function/outer alignment). The original paperclip maximizer was an inner alignment problem: what if in the course of training an AI, deep in its connection weights, it learned a “preference” for items shaped like paperclips.
But it's still useful as an outer alignment intuition pump.
The wealthy are not powerful enough to "hoard" treatments, because Medicare et al represent the government, which has a monopoly on violence and incentives to not allow such hoarding.
That's naive. If a private has obedient ASI, they also have a monopoly on violence now. If labour has become superfluous, states have lost all incentive to care about the opinion of people.
So the idea is to use "Artificial Intention" to specifically speak of the subset of concerns about what outcomes an artificial system will try to steer for, rather than the concerns about the world-states that will result in practice from the interaction of that artificial system's steering plus the steering of everything else in the world?
Makes sense. I expect it's valuable to also have a term for the bit where you can end up in a situation that nobody was steering for due to the interaction of multiple systems, but explicitly separating those concerns is probably a good idea.
As I disclaimed, the frame of the post does rule out relevance of this point, it's not a response to the post's interpretation that has any centrality. I'm more complaining about the background implication that rewards are good (this is not about happiness specifically). Just because natural selection put a circuit in my mind, doesn't mean I prefer to follow its instruction, either in ways that natural selection intended, or in ways that it didn't. Human misalignment relative to natural selection doesn't need to go along with rewards at all, let alone seeking superstimulus. Rewards probably play some role in the process of figuring out what is right, but there is no robust reason for their contribution to even be pointing in the obvious direction.
Fair enough. (though...really you could in principle still handle filtered evidence in a formalish way. It just would require a bunch of additional complication regarding your priors and evidence on how the filter operates).
Nice overview! I mostly agree.
>What I do not expect is something I’d have been happy to pay $500 or $1,000 for, but not $3,500. Either the game will be changed, or it won’t be changed quite yet. I can’t wait to find out.
From context, I assume you're saying this about the current iteration?
I guess willingness to pay for different things depends on one's personal preferences, but here's an outcome that I find somewhat likely (>50%):
there's a sort of anthropic issue where if we already had compelling evidence (or no evidence) we wouldn't be having this discussion.
Yes, our discussion is based on the evidence we actually see. But, to then discount the evidence because if we had different evidence we wouldn't be having the same discussion, is to rule out updating on evidence at all, if that evidence would influence our discussion.
Is there a prior for the likely resolution of fuzzy evidence in general?
In my view, there is a general tendency to underestimate the likelihood of encountering weird-seeming evidence, and especially of encountering it indirectly via a filtering process where the weirdest and most alien-congruent evidence (or game-of-telephone enhanced stories) gets publicly disseminated. For this reason, a bunch of fuzzy evidence is not particularly strong evidence for aliens.
Thanks for your comment, I think it raises an important point if I understood it correctly. But I'm not sure if I have understood it correctly. Are you saying that by doing random things that make other people happy, I would be messing with their reward function? So that I would, for example, reward and thus incentivise random other things the person doesn't really value?
In writing this, I had indeed assumed that while happiness is probably not the only valuable thing and we wouldn't want to hook everybody up to a happiness machine, the marginal bit of happiness in our world would be positive and quite harmless. But maybe superstimuli are a counterexample to that? I have to think about it more.
Sorry for the late reply, I haven't commented much on LW and it didn't appreciate the time it would take for someone to reply to me, so I missed this until now. If I reply to you, Ape in the coat, does that notify dr_s too?
If I understand dr_s's quotation, I believe he's responding to the post I referenced. How Many Lives Does X-Risk Work Save from Non-Existence includes pretty early on:
Whenever I say "lives saved" this is shorthand for “future lives saved from nonexistence.” This is not the same as saving existing lives, which may cause profound emotional pain for people left behind, and some may consider more tragic than future people never being born.[6]
I assume a zero-discount rate for the value of future lives, meaning I assume the value of a life is not dependent on when that life occurs.
It seems pretty obvious to me that in almost any plausible scenario, the lifespan of a distant future entity with moral weight will be very different from what we currently think of as a natural life span (rounded to 100 years in the post I linked), but making estimates in terms of "lives saved from non existence" where life = 100 years is useful for making comparisons to other causes like "lives saved per $1,000 via malaria bed nets." It also seems appropriate for the post not to assume a discount rate and to leave that to the reader to apply themselves on top of the estimates presented.
I prefer something like "observer moments that might not have occurred" to "lives saved." I don't have strong preferences between a relatively small number of entities having long lives or more numerous entities having shorter lives, so long as the quality of the life per moment is held constant.
As for dr_S's "How bad can a life be before the savings actually counts as damning" this seems easily resolvable to me by just allowing "people" of the far future the right to commit suicide, perhaps after a short waiting period. This would put a floor on the suffering they experience if they can't otherwise be guaranteed to have great lives.
I imagine this will relax over time, like the early iPhone didn't allow any access for apps to the phonecall hardware.
>All accounts agree that Apple has essentially solved issues with fit and comfort.
Besides the 30min point, is it really true that all accounts agree on that? I definitely remember reading in at least two reports something along the lines of, "clearly you can't use this for hours, because it's too heavy". Sorry for not giving a source!
Sure, but that's not about formal-ish updating that frames this post, where you are writing down likelihood ratios and computing credences.
Agreed that paying attention to how evidence is filtered is super important. But, in principle, you can still derive conclusions from filtered evidence. It's just really hard, especially if the filter is strong and hard to characterize (as is the case with UAPs).
Meta question: If you think there is a 1 in 1000 chance that you are wrong
I don't think that credence is well thought of that way. Attempts to change my mind might change my credence even if they don't change it to me thinking that a natural origin would be the most likely.
I don't trust Seymour Hersh's anonymous sources more than 70/30, even when The New Yorker publishes his pieces.
My own beliefs don't rest on a single piece. I don't think that anyone should hold credence that is as high as mine just because they read this article.
Like, what are the odds that the anonymous sources are members of the intelligence community who are saying it now as part of the [CIA's, NSA's, whatever's] current political strategy relative to China?
Is that's the CIA position they could have just changed the official CIA position and say "We uncovered new evidence and now believe that the lab leak theory is more likely" there would have been no reason to tell a story about how they overruled their own analysts to hide the lab leak theory. The story as it stands damages the reputation of those agencies and I think "The CIA does what's good for the CIA" is a good heuristic to think about their actions.
Do philosophers commonly use the word "intention" to refer to mental states that have intentionality, though? For example, from the SEP article on intentionality:
>intention and intending are specific states of mind that, unlike beliefs, judgments, hopes, desires or fears, play a distinctive role in the etiology of actions. By contrast, intentionality is a pervasive feature of many different mental states: beliefs, hopes, judgments, intentions, love and hatred all exhibit intentionality.
(This is specifically where it talks about how intentionality and the colloquial meaning of intention must not be confused, though.)
Ctrl+f-ing through the SEP article gives only one mention of "intention" that seems to refer to intentionality. ("The second horn of the same dilemma is to accept physicalism and renounce the 'baselessness' of the intentional idioms and the 'emptiness' of a science of intention.") The other few mentions of "intention" seem to talk about the colloquial meaning. The article seems to generally avoid the avoid "intention". Generally the article uses "intentional" and "intentionality".
Incidentally, there's also an SEP article on "intention" that does seem to be about what one would think it to be about. (E.g., the first sentence of that article: "Philosophical perplexity about intention begins with its appearance in three guises: intention for the future, as I intend to complete this entry by the end of the month; the intention with which someone acts, as I am typing with the further intention of writing an introductory sentence; and intentional action, as in the fact that I am typing these words intentionally.")
So as long as we don't call it "artificial intentionality research" we might avoid trouble with the philosophers after all. I suppose the word "intentional" becomes ambiguous, however. (It is used >100 times in both SEP articles.)
We can consider whatever, there is no fundamental duty to only think in particular ways. The useful constraints are on declaring something a claim of fact, not muddying epistemic commons or damaging decision relevant considerations; and in large quantities, on what makes terrible training data for the brain, damaging the aspects with known good properties. Everything else is work in progress, with boundaries impossible to codify while remaining on human level.
Some thinking processes seem to be more useful for arriving at true or useful results; paying attention to that property of processes is rationality. This doesn't disqualify processes of which we know less, that would be throwing away the full current force of your mind.
The other comment is about updating and credences. I'm not engaging in updating or credences in this thread.
Thanks for this detailed response; I found it quite helpful. I maintain my "yeah, they should probably get as much funding as they want" stance. I'm especially glad to see that Lightcone might be interested in helping people stay sane/grounded as many people charge into the policy space.
I ended up deciding to instead publish a short post, expecting that people will write a lot of questions in the comments, and then to engage straightforwardly and transparently there, which felt like a way that was more likely to end up with shared understanding.
This seems quite reasonable to me. I think it might've been useful to include something short in the original post that made this clear. I know you said "also feel free to ask any questions in the comments"; in an ideal world, this would probably be enough, but I'm guessing this isn't enough given power/status dynamics.
For example, if ARC Evals released a post like this, I expect many people would experience friction that prevented them from asking (or even generating) questions that might (a) make ARC Evals look bad, (b) make the commenter seem dumb, or (c) potentially worsen the relationship between the commenter and ARC evals.
To Lightcone's credit, I think Lightcone has maintained a (stronger) reputation of being fairly open to objections (and not penalizing people for asking "dumb questions" or something like that), but the Desire Not to Upset High-status People or Desire Not to Look Dumb In Front of Your Peers By Asking Things You're Already Supposed to Know are strong.
I'm guessing that part of why I felt comfortable asking (and even going past the "yay, I like Lightcone and therefore I support this post" to the mental motion of "wait, am I actually satisfied with this post? What questions do I have") is that I've had a chance to interact in-person with the Lightcone team on many occasions, so I felt considerably less psychological friction than most.
All things considered, perhaps an ideal version of the post would've said something short like "we understand we haven't given any details about what we're actually planning to do or how we'd use the funding. This is because Oli finds this stressful. But we actually really want you to ask questions, even "dumb questions", in the comments."
(To be clear I don't think the lack of doing this was particularly harmful, and I think your comment definitely addresses this. I'm nit-picking because I think it's an interesting microcosm of broader status/power dynamics that get in the way of discourse, and because I expect the Lightcone team to be unusually interested in this kind of thing.)
It's not a take that I've thought about deeply, but could the evidence be explained by a technological advancement: the ability to hop between diverging universes?
It would explain why we don't see aliens; they discover the technology, and that empty parallel worlds are closer in terms of energy expenditure.
It could also explain why the interlopers don't bother us much; they are scouting for uninhabited parallel earths with easily-accessible resources, and skipping those with a population. The only ones we see are the ones incompetent or unlucky enough to crash.
It would explain why aliens aren't ridiculously outclassing us technologically. They don't have to solve interstellar travel before they start hopping.
It would provide an alternate explanation for why aliens 'look like us'; they are from timelines with varying amounts of divergence. (The default explanation of course being that we are primed to see humans everywhere, so our imagined monsters look human.)
I can easily think of a few arguments against this possibility.
If dimension hoppers aren't far ahead of us technologically, trading with us has advantages. Why skip, instead of open trade?
Technology would probably continue to advance. Hyper-advanced dimension hoppers should be better capable of scouting dimensions, and of displacing populated worlds, and yet we don't see them. (Perhaps they are better at hiding, but then, they don't need to hide.)
Instead of 'where is the alien AI' we are now left with 'where is the divergent timeline AI'.
That last one in particular makes me think this explanation isn't likely. I'd expect rogue AI and self-replicating machines to be invading constantly.
As I understand, the main difference form her view is that decoherence is the relation between objects in the system, but measurement is related to the whole system "collapse".
I think I would agree to "decoherence does not solve the measurement problem" as the measurement problem has different sub-problems. One corresponds to the measurement postulate which different interpretations address differently and which Sabine Hossenfelder is mostly referring to in the video. But the other one is the question of why the typical measurement result looks like a classical world - and this is where decoherence is extremely powerful: it works so well that we do not have any measurements which manage to distinguish between the hypotheses of
With regards to her example of Schrödinger's cat, this means that the state will not actually occur. It will always be a state where the environment must be part of the equation such that the state is more like after a nanosecond and already includes any surrounding humans after a microsecond (light went 300 m in all directions by then). When human perception starts being relevant, the state is With regards to the first part of the measurement problem, this is not yet a solution. As such I would agree with Sabine Hossenfelder. But it does take away a lot of the weirdness because there is no branch on the wave function that contains non-classical behaviour[1].
Wigner's friend.
You got me here. I did not follow the large debate around Wigner's friend as i) this is not the topic I should spend huge amounts of time on, and ii) my expectations were that these will "boil down to normality" once I manage to understand all of the details of what is being discussed anyway.
It can of course be that people would convince me otherwise, but before that happens I do not see how these types of situations could lead to strange behaviour that isn't already part of the well-established examples such as Schrödinger's cat. Structurally, they only differ in that there are multiple subsequent 'measurements', and this can only create new problems if the formalism used for measurements is the source. I am confident that the many worlds and Bohmian interpretations do not lead to weirdness in measurements[2], such that I am as-of-yet not convinced.
I think (give like 30 per cent probability) that the general nature of the UFO phenomenon is that it is anti-epistemic
Thanks for clarifying! (I take this to be mostly 'b) physical world' in that it isn't 'humans have bad epistemics') Given the argument of the OP, I would at least agree that the remaining probability mass for UFOs/weirdness as a physical thing is on the cases where the weird things do mess with our perception, sensors and/or epistemics.
The difficult thing about such hypotheses is that they can quickly evolve to being able to explain anything and becoming worthless as a world-model.
You've convinced me! I don't want to defend the claim you quoted, so I'll modify "arguably" into something much weaker.
Given real aliens, how can you be sure of making any claims at all about their civilization/technology/culture/anything without having the sort of observational evidence that would be necessary to make such claims?
We're in Cartesian Demon territory when discussing these theoretical others. We can plop our human notions on top of them all we want, but unless we have direct, observable evidence of the way "they" think/operate/whatever, we can just as easily assume any given conclusion about them as just as likely as any other. And that includes all the N conclusions we haven't even thought of (or simply can't conceive of due to our necessarily human viewpoint).
It seems wildly overconfident to make any claims about them at all that aren't completely hypothetical in the way you describe in your other reply here. Your idea that they either have to have capped tech or be actively trolling is itself just a hypothesis at best, and an idea at worst.
All filtered evidence is good for is formulating hypotheses, or even just inspiring ideas that are not hypotheses.
I hear that rant. I do my best now to reconcile thing into a cohesive whole in text but that means the timestamps are gone, so if I miss something later I can't tell.
I am skeptical that we're near the end of the useful road on data/compute, although I agree that in the 2030 timeframe that's not where the low hanging fruit is mostly going to be. My prediction of 'this 2030 arrives by 2027' is based largely on other ways of improving.
To extent compute/data help I think of it more as helping to enable other things.
Meta question: If you think there is a 1 in 1000 chance that you are wrong, why would I spend any amount of time trying to change your mind? I am 99.9 percent confident in very few propositions outside of arithmetic.
Like, what are the odds that the anonymous sources are members of the intelligence community who are saying it now as part of the [CIA's, NSA's, whatever's] current political strategy relative to China? I don't trust Seymour Hersh's anonymous sources more than 70/30, even when The New Yorker publishes his pieces.
Sam Altman (in context of risks of AI): “What I lose the most sleep over is the hypothetical idea that we already have done something really bad by launching ChatGPT.”
That seems misplaced, in the sense that launching OpenAI and its general path of development seem like the places to be most worried you are in error, unless the error in ChatGPT is ‘get everyone excited about AI.’ Which is a different risk model that has many implications. I would have liked to hear more details of the threat model here.
I can't speak to Sam's threat model, but my current threat model is that the greatest risk comes from open source code and models, or non-safety-conscious labs. I think our best hope is for a safety conscious lab to get to a powerful AGI first and keep it safely contained, and study it and make iterative progress on aligning it well enough to use it to undertake a pivotal act. If this roughly corresponding to Sam's view, then I think what he said about ChatGPT would make sense, since the hype seems to be greatly speeding up the open source developments and drawing in more funding for new attempts. Especially if he, like me, believes that there are algorithmic advances which can be discovered that are so powerful that they could allow small orgs to leap frog all the big ones overnight. In such a scenario, the more independent groups you have making a serious try in approximately the right direction, the more 'lottery tickets' that Moloch gets to buy in the kill-us-all lottery.
I think that's better called simply coordination or cooperation problem. Alignment has the unfortunate implication of coming off as one party wanting to forcefully change the others. With AI it's fine because if you're creating a mind from scratch it'd be the height of stupidity to create an enemy.