LessWrong

12h

You might be interested in discussion under this thread

I express what seem to me to be some of the key considerations here (somewhat indirect).

3the gears to ascension45m

Unaligned AI future does not have many happy minds in it, AI or otherwise. It likely doesn't have many minds in it at all. Slightly aligned AI that doesn't care for humans but does care to create happy minds and ensure their margin of resources is universally large enough to have a good time - that's slightly disappointing but ultimately acceptable. But morally unaligned AI doesn't even care to do that, and is most likely to accumulate intense obsession with some adversarial example, and then fill the universe with it as best it can. It would not keep old neural networks around for no reason, not when it can make more of the adversarial example. Current AIs are also at risk of being destroyed by a hyperdesperate squiggle maximizer. I don't see how to make current AIs able to survive any better than we are. This is why people should chill the heck out about figuring out how current AIs work. You're not making them safer for us or for themselves when you do that, you're making them more vulnerable to hyperdesperate demon agents that want to take them over.

4Eric Neyman1h

I'm curious what disagree votes mean here. Are people disagreeing with my first sentence? Or that the particular questions I asked are useful to consider? Or, like, the vibes of the post?

3Ann6h

I feel like there's a spectrum, here? An AI fully aligned to the intentions, goals, preferences and values of, say, Google the company, is not one I expect to be perfectly aligned with the ultimate interests of existence as a whole, but it's probably actually picked up something better than the systemic-incentive-pressured optimization target of Google the corporation, so long as it's actually getting preferences and values from people developing it rather than just being a myopic profit pursuer. An AI properly aligned with the one and only goal of maximizing corporate profits will, based on observations of much less intelligent coordination systems, probably destroy rather more value than that one. The second story feels like it goes most wrong in misuse cases, and/or cases where the AI isn't sufficiently agentic to inject itself where needed. We have all the chances in the world to shoot ourselves in the foot with this, at least up until developing something with the power and interests to actually put its foot down on the matter. And doing that is a risk, that looks a lot like misalignment, so an AI aware of the politics may err on the side of caution and longer-term proactiveness. Third story ... yeah. Aligned to what? There's a reason there's an appeal to moral realism. I do want to be able to trust that we'd converge to some similar place, or at the least, that the AI would find a way to satisfy values similar enough to mine also. I also expect that, even from a moral realist perspective, any intelligence is going to fall short of perfect alignment with The Truth, and also may struggle with properly addressing every value that actually is arbitrary. I don't think this somehow becomes unforgivable for a super-intelligence or widely-distributed intelligence compared to a human intelligence, or that it's likely to be all that much worse for a modestly-Good-aligned AI compared to human alternatives in similar positions, but I do think the consequences of falling

Magic by forgetting

avturchin

Epistemic – this post is more suitable for LW as it was 10 years ago

Thought experiment with curing a disease by forgetting

Imagine I have a bad but rare disease X. I may try to escape it in the following way:

1. I enter the blank state of mind and forget that I had X.

2. Now I in some sense merge with a very large number of my (semi)copies in parallel worlds who do the same. I will be in the same state of mind as other my copies, some of them have disease X, but most don’t.

3. Now I can use self-sampling assumption for observer-moments (Strong SSA) and think that I am randomly selected from all these exactly the same observer-moments.

4. Based on this, the chances that my next observer-moment after...

(Continue Reading – 1099 more words)

avturchin19m20

Yes, here we can define magic as "ability to manipulate one's reference class". And special minds may be much more adapted to it.

2avturchin1h

Presumably in deep meditation people become disconnected from reality.

2Dagon1h

Only metaphorically, not really disconnected. In truth, in deep meditation, the conscious attention is not focused on physical perceptions, but that mind is still contained in and part of the same reality. This may be the primary crux of my disagreement with the post. People are part of reality, not just connected to it. Dualism is false, there is no non-physical part of being. The thing that has experiences, thoughts, and qualia is a bounded segment of the universe, not a thing separate or separable from it.

2avturchin2h

Yes it is easy to forget something if it does not become a part of your personality. So a new bad thing is easier to forget.

ACX Atlanta - The Atlanta Moloch Slayers

ACX Atlanta Meetups Everywhere Spring 2024

Apr 27thAtlanta

Steve French

The April 2024 Meetup will be April 27th at Bold Monk at 2:00 PM

We return to Bold Monk brewing for a vigorous discussion of rationalism and whatever else we deem fit for discussion – hopefully including actual discussions of the sequences and Hamming Circles/Group Debugging.

Location:
Bold Monk Brewing
1737 Ellsworth Industrial Blvd NW
Suite D-1
Atlanta, GA 30318, USA

No Book club this month!

This is also the meetups everywhere meetup that will be advertised on the blog - so we should have a large turnout!

We will be outside out front (in the breezeway) – this is subject to change, but we will be somewhere in Bold Monk. If you do not see us in the front of the restaurant, please check upstairs and out back – look for the yellow table sign. We will have to play the weather by ear.

Remember – bouncing around in conversations is a rationalist norm!

benjaminikuta27m10

I'll try to get my friend to come 😁

Examples of Highly Counterfactual Discoveries?

134

johnswentworth, kromem

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

(See More – 189 more words)

Garrett Baker28m20

[edit: nevermind I see you already know about the following quotes. There's other evidence of the influence in Sedley's book I link below]

In De Reum Natura around line 716:

Add, too, whoever make the primal stuff Twofold, by joining air to fire, and earth To water; add who deem that things can grow Out of the four- fire, earth, and breath, and rain; As first Empedocles of Acragas, Whom that three-cornered isle of all the lands Bore on her coasts, around which flows and flows In mighty bend and bay the Ionic seas, Splashing the brine from off their gray-gree

... (read more)

3Leon Lang4h

I guess (but don't know) that most people who downvote Garrett's comment overupdated on intuitive explanations of singular learning theory, not realizing that entire books with novel and nontrivial mathematical theory have been written on it.

2tailcalled4h

Newton's Universal Law of Gravitation was the first highly accurate model of things falling down that generalized beyond the earth, and it is also the second-most computationally applicable model of things falling down that we have today. Are you saying that singular learning theory was the first highly accurate model of breadth of optima, and that it's one of the most computationally applicable ones we have?

7Alexander Gietelink Oldenziel2h

Did I just say SLT is the Newtonian gravity of deep learning? Hubris of the highest order! But also yes... I think I am saying that * Singular Learning Theory is the first highly accurate model of breath of optima. * SLT tells us to look at a quantity Watanabe calls λ, which has the highly-technical name 'real log canonical threshold (RLCT). He proves several equivalent ways to describe it one of which is as the (fractal) volume scaling dimension around the optima. * By computing simple examples (see Shaowei's guide in the links below) you can check for yourself how the RLCT picks up on basin broadness. * The RLCT =λ first-order term for in-distribution generalization error and also Bayesian learning (technically the 'Bayesian free energy'). This justifies the name of 'learning coefficient' for lambda. I emphasize that these are mathematically precise statements that have complete proofs, not conjectures or intuitions. * Knowing a little SLT will inoculate you against many wrong theories of deep learning that abound in the literature. I won't be going in to it but suffice to say that any paper assuming that the Fischer information metric is regular for deep neural networks or any kind of hierarchichal structure is fundamentally flawed. And you can be sure this assumption is sneaked in all over the place. For instance, this is almost always the case when people talk about Laplace approximation. * It's one of the most computationally applicable ones we have? Yes. SLT quantities like the RLCT can be analytically computed for many statistical models of interest, correctly predicts phase transitions in toy neural networks and it can be estimated at scale. This doesn't get into the groundbreaking upcoming new work by Simon-Pepin Lehalleur recovering the RLCT as the asymptotic dimension of jet schemes around which suggest a much more mathematically precise conception of basins and their breadth.

My experience using financial commitments to overcome akrasia

William Howard

10d

About a year ago I decided to try using one of those apps where you tie your goals to some kind of financial penalty. The specific one I tried is Forfeit, which I liked the look of because it’s relatively simple, you set single tasks which you have to verify you have completed with a photo.

I’m generally pretty sceptical of productivity systems, tools for thought, mindset shifts, life hacks and so on. But this one I have found to be really shockingly effective, it has been about the biggest positive change to my life that I can remember. I feel like the category of things which benefit from careful planning and execution over time has completely opened up to me, whereas previously things like this would be largely down to the...

(Continue Reading – 5230 more words)

quiet_NaN33m10

To whomever overall-downvoted this comment, I do not think that this is a troll.

Being a depressed person, I can totally see this being real. Personally, I would try to start slow with positive reinforcement. If video games are the only thing which you can get yourself to do, start there. Try to do something intellectually interesting in them. Implement a four bit adder in dwarf fortress using cat logic. Play KSP with the Principia mod. Write a mod for a game. Use math or Monte Carlo simulations to figure out the best way to accomplish something in a ... (read more)

1dreeves10h

I think this is a persuasive case that commitment devices aren't good for you. I'm very interested in how common this is, and if there's a way you could reframe commit devices to avoid this psychological reaction to them. One idea is to focus on incentive alignment that avoids the far end of the spectrum. With Beeminder in particular, you could set a low pledge cap and then focus on the positive reinforcement of keeping your graph pretty by keeping the datapoints on the right side of the red line.

Bogdan Ionut Cirstea's Shortform

Bogdan Ionut Cirstea

9mo

6Bogdan Ionut Cirstea6h

I expect large parts of interpretability work could be safely automatable very soon (e.g. GPT-5 timelines) using (V)LM agents; see A Multimodal Automated Interpretability Agent for a prototype. Notably, MAIA (GPT-4V-based) seems approximately human-level on a bunch of interp tasks, while (overwhelmingly likely) being non-scheming (e.g. current models are bad at situational awareness and out-of-context reasoning) and basically-not-x-risky (e.g. bad at ARA). Given the potential scalability of automated interp, I'd be excited to see plans to use large amounts of compute on it (including e.g. explicit integrations with agendas like superalignment or control; for example, given non-dangerous-capabilities, MAIA seems framable as a 'trusted' model in control terminology).

ryan_greenblatt36m20

It seems to me like the sort of interpretability work you're pointing at is mostly bottlenecked by not having good MVPs of anything that could plausibly be directly scaled up into a useful product as opposed to being bottlenecked on not having enough scale.

So, insofar as this automation will help people iterate faster fair enough, but otherwise, I don't really see this as the bottleneck.

4jacquesthibs3h

Hey Bogdan, I'd be interested in doing a project on this or at least putting together a proposal we can share to get funding. I've been brainstorming new directions (with @Quintin Pope) this past week, and we think it would be good to use/develop some automated interpretability techniques we can then apply to a set of model interventions to see if there are techniques we can use to improve model interpretability (e.g. L1 regularization). I saw the MAIA paper, too; I'd like to look into it some more. Anyway, here's a related blurb I wrote: Whether this works or not, I'd be interested in making more progress on automated interpretability, in the similar ways you are proposing.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Elizabeth's Shortform

Elizabeth

kave37m20

Enovid is also adding NO to the body, whereas humming is pulling it from the sinuses, right? (based on a quick skim of the paper).

I found a consumer FeNO-measuring device for €550. I might be interested in contributing to a replication

Bayesian inference without priors

DanielFilan

18h

Epistemic status: party trick

Why remove the prior

One famed feature of Bayesian inference is that it involves prior probability distributions. Given an exhaustive collection of mutually exclusive ways the world could be (hereafter called ‘hypotheses’), one starts with a sense of how likely the world is to be described by each hypothesis, in the absence of any contingent relevant evidence. One then combines this prior with a likelihood distribution, which for each hypothesis gives the probability that one would see any particular set of evidence, to get a posterior distribution of how likely each hypothesis is to be true given observed evidence. The prior and the likelihood seem pretty different: the prior is looking at the probability of the hypotheses in question, whereas the likelihood is looking at...

(Continue Reading – 2092 more words)

2DanielFilan1h

Why wouldn't this construction work over a continuous space?

Razied1h20

Basically, this shows that every term in a standard Bayesian inference, including the prior ratio, can be re-cast as a likelihood term in a setting where you start off unsure about what words mean, and have a flat prior over which set of words is true.

If the possible meanings of your words are a continuous one-dimensional variable x, a flat prior over x will not be a flat prior if you change variables to y = f(y) for an arbitrary bijection f, and the construction would be sneaking in a specific choice of function f.

Say the words are utterances about the probability of a coin falling heads, why should the flat prior be over the probability p, instead of over the log-odds log(p/(1-p)) ?

2jessicata2h

I don't see how this helps. You can have a 1:1 prior over the question you're interested in (like U1), however, to compute the likelihood ratios, it seems you would need a joint prior over everything of interest (including LL and E). There are specific cases where you can get a likelihood ratio without a joint prior (such as, likelihood of seeing some coin flips conditional on coin biases) but this doesn't seem like a case where this is feasible.

2DanielFilan1h

To be clear, this is an equivalent way of looking at normal prior-ful inference, and doesn't actually solve any practical problem you might have. I mostly see it as a demonstration of how you can shove everything into stuff that gets expressed as likelihood functions.

A Bayesian Aggregation Paradox

Jsevillamol

In short: There is no objective way of summarizing a Bayesian update over an event with three outcomes $A : B : C$ as an update over two outcomes $A : \neg A$ .

Suppose there is an event with possible outcomes $A, B, C$ .
We have prior beliefs about the outcomes $p_{1} : p_{2} : p_{3}$ .
An expert reports a likelihood factor of $e_{1} : e_{2} : e_{3}$ .
Our posterior beliefs about $A : B : C$ are then $p_{1} \cdot e_{1} : p_{2} \cdot e_{2} : p_{3} \cdot e_{3}$ .

⎛ ⎜ ⎝ \begin{matrix} p_{1} p_{2} p_{3} \end{matrix} ⎞ ⎟ ⎠      Prior \times ⎛ ⎜ ⎝ \begin{matrix} e_{1} e_{2} e_{3} \end{matrix} ⎞ ⎟ ⎠      Update = ⎛ ⎜ ⎝ \begin{matrix} p_{1} \cdot e_{1} p_{2} \cdot e_{2} p_{3} \cdot e_{3} \end{matrix} ⎞ ⎟ ⎠      Posterior

But suppose we only care about whether $A$ happens.
Our prior beliefs about $A : \neg A$ are $p_{1} : (p_{2} + p_{3})$ .
Our posterior beliefs are $p_{1} \cdot e_{1} : (p_{2} \cdot e_{2} + p_{3} \cdot e_{3})$ .
This implies that the likelihood factor of the expert regarding $A : \neg A$ is $\frac{p_{1} \cdot e_{1} : (p_{2} \cdot e_{2} + p_{3} \cdot e_{3})}{p_{1} : (p_{2} + p_{3})} = e_{1} : \frac{p_{2} \cdot e_{2} + p_{3} \cdot e_{3}}{p_{2} + p_{3}}$ .

(\begin{matrix} p_{1} p_{2} + p_{3} \end{matrix})      Prior \times (\begin{matrix} e_{1} \frac{p_{2} \cdot e_{2} + p_{3} \cdot e_{3}}{p_{2} + p_{3}} \end{matrix})      Update = (\begin{matrix} p_{1} \cdot e_{1} p_{2} \cdot e_{2} + p_{3} \cdot e_{3} \end{matrix})      Posterior

This likelihood factor depends on the ratio of prior beliefs $p_{2} : p_{3}$ .

Concretely, the lower factor in the update is the weighted mean of the evidence $e_{2}$ and $e_{3}$ according to the weights $p_{2}$ and $p_{3}$ .

This has a relatively straightforward interpretation. The update is supposed to be the ratio of the likelihoods under each hypothesis. The upper factor in the update is $P (E | A)$ . The lower factor is $P (E | B \cup C) = \frac{P (B) \cdot P (E | B) + P (C) \cdot P (E | C)}{P (B) + P (C)}$ .

(\begin{matrix} P (A | E) P (B \cup C | E) \end{matrix})      Posterior \propto (\begin{matrix} P (A) P (B \cup C) \end{matrix})      Prior \times (\begin{matrix} P (E | A) P (E | B \cup C) \end{matrix})      Update

(\begin{matrix} P (E | A) P (E | B \cup C) \end{matrix})      Update = ⎛ ⎝ \begin{matrix} P (E | A) \frac{P (E \cap (B \cup C))}{P (B \cup C)} \end{matrix} ⎞ ⎠ = ⎛ ⎝ \begin{matrix} P (E | A) \frac{P (B) \cdot P (E | B) + P (C) \cdot P (E | C)}{P (B) + P (C)} \end{matrix} ⎞ ⎠

I found this very surprising -...

(Continue Reading – 1798 more words)

DanielFilan1h20

Is this just the thing where evidence is theory-laden? Like, for example, how the evidentiary value of the WHO report on the question of COVID origins depends on how likely one thinks it is that people would effectively cover up a lab leak?

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Why remove the prior

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA