LessWrong

This is a linkpost for https://www.maximum-progress.com/p/ai-regulation-is-unsafe

Concerns over AI safety and calls for government control over the technology are highly correlated but they should not be.

There are two major forms of AI risk: misuse and misalignment. Misuse risks come from humans using AIs as tools in dangerous ways. Misalignment risks arise if AIs take their own actions at the expense of human interests.

Governments are poor stewards for both types of risk. Misuse regulation is like the regulation of any other technology. There are reasonable rules that the government might set, but omission bias and incentives to protect small but well organized groups at the expense of everyone else will lead to lots of costly ones too. Misalignment regulation is not in the Overton window for any government. Governments do not have strong incentives...

(Continue Reading – 1176 more words)

Daniel Kokotajlo2m20

Big +1 to that. Part of why I support (some kinds of) AI regulation is that I think they'll reduce the risk of totalitarianism, not increase it.

2Daniel Kokotajlo3m

So, it sounds like you'd be in favor of a 1-year pause or slowdown then, but not a 10-year?

Eric Neyman's Shortform

Eric Neyman

2Wei Dai4h

Why do you think these values are positive? I've been pointing out, and I see that Daniel Kokotajlo also pointed out in 2018 that these values could well be negative. I'm very uncertain but my own best guess is that the expected value of misaligned AI controlling the universe is negative, in part because I put some weight on suffering-focused ethics.

1mesaoptimizer6h

e/acc is not a coherent philosophy and treating it as one means you are fighting shadows. Landian accelerationism at least is somewhat coherent. "e/acc" is a bundle of memes that support the self-interest of the people supporting and propagating it, both financially (VC money, dreams of making it big) and socially (the non-Beff e/acc vibe is one of optimism and hope and to do things -- to engage with the object level -- instead of just trying to steer social reality). A more charitable interpretation is that the philosophical roots of "e/acc" are founded upon a frustration with how bad things are, and a desire to improve things by yourself. This is a sentiment I share and empathize with. I find the term "techno-optimism" to be a more accurate description of the latter, and perhaps "Beff Jezos philosophy" a more accurate description of what you have in your mind. And "e/acc" to mainly describe the community and its coordinated movements at steering the world towards outcomes that the people within the community perceive as benefiting them.

Quinn12m10

sure -- i agree that's why i said "something adjacent to" because it had enough overlap in properties. I think my comment completely stands with a different word choice, I'm just not sure what word choice would do a better job.

Examples of Highly Counterfactual Discoveries?

140

johnswentworth, kromem

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

(See More – 189 more words)

3ChristianKl1h

Counterfactual means, that if something would not have happened something else would have happened. It's a key concept in Judea Pearl's work on causality.

3Lukas_Gloor2h

In some of his books on evolution, Dawkins also said very similar things when commenting on Darwin vs Wallace, basically saying that there's no comparison, Darwin had a better grasp of things, justified it better and more extensively, didn't have muddled thinking about mechanisms, etc.

francis kafka27m10

I mean to some extent, Dawkins isn't a historian of science, presentism, yadda yadda but from what I've seen he's right here. Not that Wallace is somehow worse, given that of all the people out there he was certainly closer than the rest. That's about it

2dr_s3h

Well, it's hard to tell because most other civilizations at the required level of wealth to discover this (by which I mean both sailing and surplus enough to have people who worry about the shape of the Earth at all) could one way or another have learned it via osmosis from Greece. If you only have essentially two examples, how do you tell whether it was the one who discovered it who was unusually observant rather than the one who didn't who was unusually blind? But it's an interesting question, it might indeed be a relatively accidental thing which for some reason was accepted sooner than you would have expected (after all, sails disappearing could be explained by an Earth that's merely dome-shaped; the strongest evidence for a completely spherical shape was probably the fact that lunar eclipses feature always a perfect disc shaped shadow, and even that requires interpreting eclipses correctly, and having enough of them in the first place).

Johannes C. Mayer's Shortform

Johannes C. Mayer

Carl Feynman37m10

I would highly recommend getting someone else to debug your subconscious for you. At least it worked for me. I don’t think it would be possible for me to have debugged myself.

My first therapist was highly directive. He’d say stuff like “Try noticing when you think X, and asking yourself what happened immediately before that. Report back next week.” And listing agenda items and drawing diagrams on a whiteboard. As an engineer, I loved it. My second therapist was more in the “providing supportive comments while I tal... (read more)

Losing Faith In Contrarianism

omnizoid

18h

Crosspost from my blog.

If you spend a lot of time in the blogosphere, you’ll find a great deal of people expressing contrarian views. If you hang out in the circles that I do, you’ll probably have heard of Yudkowsky say that dieting doesn’t really work, Guzey say that sleep is overrated, Hanson argue that medicine doesn’t improve health, various people argue for the lab leak, others argue for hereditarianism, Caplan argue that mental illness is mostly just aberrant preferences and education doesn’t work, and various other people expressing contrarian views. Often, very smart people—like Robin Hanson—will write long posts defending these views, other people will have criticisms, and it will all be such a tangled mess that you don’t really know what to think about them.

For...

(Continue Reading – 1290 more words)

niplav1h20

The obsessive autists who have spent 10,000 hours researching the topic and writing boring articles in support of the mainstream position are left ignored.

It seems like you're spanning up three different categories of thinkers: Academics, public intellectuals, and "obsessive autists".

Notice that the examples you give overlap in those categories: Hanson and Caplan are academics (professors!), while the Natália Mendonça is not an academic, but is approaching being a public intellectual by now(?). Similarly, Scott Alexander strikes me as being in the "publ... (read more)

2ChristianKl3h

What makes you believe that Substack is to blame and not him unpublishing it?

2ChristianKl3h

He explicitly says that the people who argue that there's no gap are mistaken to argue that. He argues for the gap being small, not nonexistent. He does not use the term "near zero" himself.

1Jacob G-W3h

Noted, thanks.

Any evidence or reason to expect a multiverse / Everett branches?

lukehmiles

17d

My understanding is that pilot wave theory (ie Bohmian mechanics) explains all the quantum physics with no weirdness like "superposition collapse" or "every particle interaction creates n parallel universes which never physically interfere with each other". It is not fully "local" but who cares?

Is there any reason at all to expect some kind of multiverse? Why is the multiverse idea still heavily referenced (eg in acausal trade posts)?

Edit April 11: I challenge the properly physics brained people here (I am myself just a Q poster) to prove my guess wrong: Can you get the Born rule with clean hands this way?

They also implicitly claim that in order for the Born rule to work [under pilot wave], the particles have to start the sim following the psi^2 distribution.

...

(See More – 92 more words)

TAG1h20

The other problem is that MWI is up against various subjective and non-realist interpretations, so it's not it's not the case that you can build an ontological model of every interpretation.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

An Introduction to AI Sandbagging

Teun van der Weij, Felix Hofstätter, Francis Rhys Ward

Summary: Evaluations provide crucial information to determine the safety of AI systems which might be deployed or (further) developed. These development and deployment decisions have important safety consequences, and therefore they require trustworthy information. One reason why evaluation results might be untrustworthy is sandbagging, which we define as strategic underperformance on an evaluation. The strategic nature can originate from the developer (developer sandbagging) and the AI system itself (AI system sandbagging). This post is an introduction to the problem of sandbagging.

The Volkswagen emissions scandal

There are environmental regulations which require the reduction of harmful emissions from diesel vehicles, with the goal of protecting public health and the environment. Volkswagen struggled to meet these emissions standards while maintaining the desired performance and fuel efficiency of their diesel engines (Wikipedia). Consequently, Volkswagen...

(Continue Reading – 2217 more words)

dirk's Shortform

dirk

dirk1h10

Sometimes a vague phrasing is not an inaccurate demarkation of a more precise concept, but an accurate demarkation of an imprecise concept

1dirk2h

I'm against intuitive terminology [epistemic status: 60%] because it creates the illusion of transparency; opaque terms make it clear you're missing something, but if you already have an intuitive definition that differs from the author's it's easy to substitute yours in without realizing you've misunderstood.

1dirk3h

I'm not alexithymic; I directly experience my emotions and have, additionally, introspective access to my preferences. However, some things manifest directly as preferences which I have been shocked to realize in my old age, were in fact emotions all along. (In rare cases these are stronger than the ones directly-felt even, despite reliably seeming on initial inspection to be simply neutral metadata).

1dirk3h

Meta/object level is one possible mixup but it doesn't need to be that. Alternative example, is/ought: Cedar objects to thing Y. Dusk explains that it happens because Z. Cedar reiterates that it shouldn't happen, Dusk clarifies that in fact it is the natural outcome of Z, and we're off once more.

Magic by forgetting

avturchin

Epistemic – this post is more suitable for LW as it was 10 years ago

Thought experiment with curing a disease by forgetting

Imagine I have a bad but rare disease X. I may try to escape it in the following way:

1. I enter the blank state of mind and forget that I had X.

2. Now I in some sense merge with a very large number of my (semi)copies in parallel worlds who do the same. I will be in the same state of mind as other my copies, some of them have disease X, but most don’t.

3. Now I can use self-sampling assumption for observer-moments (Strong SSA) and think that I am randomly selected from all these exactly the same observer-moments.

4. Based on this, the chances that my next observer-moment after...

(Continue Reading – 1099 more words)

Donald Hobson1h20

True. But for that you need there to exist another mind almost identical to yours except for that one thing.

In the question "how much of my memories can I delete while retaining my thread of subjective experience?" I don't expect there to be an objective answer.

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

The Volkswagen emissions scandal

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA