This is a linkpost for https://www.maximum-progress.com/p/ai-regulation-is-unsafe

Concerns over AI safety and calls for government control over the technology are highly correlated but they should not be.

There are two major forms of AI risk: misuse and misalignment. Misuse risks come from humans using AIs as tools in dangerous ways. Misalignment risks arise if AIs take their own actions at the expense of human interests.

Governments are poor stewards for both types of risk. Misuse regulation is like the regulation of any other technology. There are reasonable rules that the government might set, but omission bias and incentives to protect small but well organized groups at the expense of everyone else will lead to lots of costly ones too. Misalignment regulation is not in the Overton window for any government. Governments do not have strong incentives...

(Continue Reading – 1176 more words)

4Seth Herd9h

Who is downvoting posts like this? Please don't! I see that this is much lower than the last time I looked, so it's had some, probably large, downvotes. A downvote means "please don't write posts like this, and don't read this post". Daniel Kokatijlo disagreed with this post, but found it worth engaging with. Don't you want discussions with those you disagree with? Downvoting things you don't agree with says "we are here to preach to the choir. Dissenting opinions are not welcome. Don't post until you've read everything on this topic". That's a way to find yourself in an echo chamber. And that's not going to save the world or pursue truth. I largely disagree with the conclusions and even the analytical approach taken here, but that does not make this post net-negative. It is net-positive. It could be argued that there are better posts on this topic one should read, but there certainly haven't been this week. And I haven't heard these same points made more cogently elsewhere. This is net-positive unless I'm misunderstanding the criteria for a downvote. I'm confused why we don't have a "disagree" vote on top-level posts to draw off the inarticulate disgruntlement that causes people to downvote high-effort, well-done work.

7Amalthea7h

I was down voting this particular post because I perceived it as mostly ideological and making few arguments, only stating strongly that government action will be bad. I found the author's replies in the comments much more nuanced and would not have down-voted if I'd perceived the original post to be of the same quality.

2Maxwell Tabarrok13h

Firms are actually better than governments at internalizing costs across time. Asset values incorporate the potential future flows. For example, consider a retiring farmer. You might think that they have an incentive to run the soil dry in their last season since they won't be using it in the future, but this would hurt the sale value of the farm. An elected representative who's term limit is coming up wouldn't have the same incentives. Of course, firms incentives are very misaligned in important ways. The question is: Can we rely on government to improve these incentives.

cousin_it3m20

Firms don't behave like their libertarian ideal. There's a huge amount of short-termist behavior within firms, and managers building their careers to the detriment of firms.

And yeah I don't think government can make firms more long-termist. The best it can do is ban some bad stuff. For example, a ban on AI generated content in some small area would lead to incrementally less investment in AI, which would give humanity more time to live, which is a good thing.

A Dilemma in AI Suffering/Happiness

iva

1mo

The following is an example of how if one assumes that an AI (in this case autoregressive LLM) has "feelings", "qualia", "emotions", whatever, it can be unclear whether it is experiencing something more like pain or something more like pleasure in some settings, even quite simple settings which already happen a lot with existing LLMs. This dilemma is part of the reason why I think AI suffering/happiness philosophy is very hard and we most probably won't be able to solve it.

Consider the two following scenarios:

Scenario A: An LLM is asked a complicated question and answers it eagerly.

Scenario B: A user insults an LLM and it responds.

For the sake of simplicity, let's say that the LLM is an autoregressive transformer with no RLHF (I personally think that the...

(See More – 147 more words)

Charbel-Raphaël15m20

You might be interested in reading this. I think you are reasoning in an incorrect framing.

Let's Design A School, Part 1

Sable

11h

This is a linkpost for https://affablyevil.substack.com/p/lets-redesign-public-school-part

The American school system, grades K-12, leaves much to be desired.

While its flaws are legion, this post isn’t about that. It’s easy to complain.

This post is about how we could do better.

To be clear, I’m talking about redesigning public education, so “just use the X model” where X is “charter” or “Montessori” or “home school” or “private school” isn’t sufficient. This merits actual thought and discussion.

Breaking It Down

One of the biggest problems facing public schools is that they’re asked to do several very different kinds of tasks.

On the one hand, the primary purpose of school is to educate children.

On whatever hand happens to be the case in real life, school is often more a source of social services for children and parents alike, providing food and safety...

(Continue Reading – 3115 more words)

cousin_it18m20

What if you build your school-as-social-service, and then one day find that the kids are selling drugs to each other inside the school?

Or simply that the kids are constantly interfering with each other so much that the minority who want to follow their interests can't?

Any theory of school that doesn't mention discipline is a theory of dry water. You say "for 100 children, we could have 2 nurses, 5 counselors, 5 social workers, 8 adult supervisors". This omits everything that makes a school work or fail. To start with, what specific powers and duties would your 1-supervisor-per-12-kids have? Can they remove disruptive kids from rooms? From the building entirely? Give detentions?

Hammertime Day 7: Aversion Factoring

alkjash

This is part 7 of 30 in the Hammertime Sequence. Click here for the intro.

As we move into the introspective segment of Hammertime, I want to frame our approach around the set of (unoriginal) ideas I laid out in The Solitaire Principle. The main idea was that a human being is best thought of as a medley of loosely-related, semi-independent agents across time, and also as governed by a panel of relatively antagonistic sub-personalities à la Inside Out.

An enormous amount of progress can therefore be made simply by articulating the viewpoints of one’s sub-personalities so as to build empathy and trust between them. This is the aim of the remainder of the first cycle.

Day 7: Aversion Factoring

Goal factoring is a CFAR technique with a lot of parts. The most...

(See More – 682 more words)

igshipilov29m10

I really can't get the point from the "3. Solve or Reduce Aversions", specifically:
> Meanwhile, un-endorsed aversions should be targeted with exposure therapy or CoZE.

As I can see, here we should get rid of bad aversions. But the rest part of the text sounds like we should... reinforce them?..
> To apply exposure therapy, build a path of incremental steps towards the aversion

How to use and interpret activation patching

StefanHex, Neel Nanda

33m

This is a write-up of Neel’s and my experience and opinions on best practices for doing Activation Patching. A arXiv PDF version of this post is available here (easier to cite). A previous version was shared with MATS Program scholars in July 2023 under the title "Everything Activation Patching".

Pre-requisites: This post is mainly aimed at people who are familiar with the basic ideas behind activation patching. For background see this ARENA tutorial or this post by Neel.

Tl,DR:

In most situations, use activation patching instead of ablations. Different corrupted prompts give you different information, be careful about what you choose and try to test a range of prompts.
There are two different directions you can patch in: denoising and noising. These are not symmetric. Be aware of what a patching result implies!
1. Denoising

...

(Continue Reading – 5367 more words)

Examples of Highly Counterfactual Discoveries?

johnswentworth, kromem

11h

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

(See More – 189 more words)

Answer by Alexander Gietelink OldenzielApr 24, 202440

An example that's probably not very large is the discovery of DNA as the inheritance particle.

I had great fun reading Watson's scientific-literary fiction Double Helix. Watson and Crick are very clear that competitors were hot on their heels, a matter of months perhaps.

2Answer by cousin_it37m

I sometimes had this feeling from Conway's work, in particular, combinatorial game theory and surreal numbers to me feel closer to mathematical invention than mathematical discovery. This kind of things are also often "leaf nodes" on the tree of knowledge, not leading to many followup discoveries, so you could say their counterfactual impact is low for that reason. In engineering, the best example I know is vulcanization of rubber. It has had a huge impact on today's world, but Goodyear developed it by working alone for decades, when nobody else was looking in that direction.

2Alexander Gietelink Oldenziel39m

Feymann's path integral formulation can't be that counterfactually large. It's mathematically equivalent to Schwingers formulation and done several years earlier by Tomonaga.

3Templarrr1h

Gemini is telling you a popular urban legend-level understanding of what happened. The creation of Penicillin as a random event, "by mistake", has at most tangential touch with reality. But it is a great story, so it spread like wildfire. In most cases when we read "nobody investigated" it actually means "nobody succeeded yet, so they weren't in a hurry to make it known", which isn't very informative point of data. No one ever succeeds, until they do. And in this case it's not even that - antibiotic properties of some molds were known and applied for centuries before that (well, obviously, before the theory of germs they weren't known as "antibiotic", just that they helped...), the great work of Fleming and later scientists was about finding the particularly effective type of mold and extracting the exact effective chemical as well as finding a way to produce that at scale.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

faul_sname's Shortform

faul_sname

5mo

faul_sname1h20

So I keep seeing takes about how to tell if LLMs are "really exhibiting goal-directed behavior" like a human or whether they are instead "just predicting the next token". And, to me at least, this feels like a confused sort of question that misunderstands what humans are doing when they exhibit goal-directed behavior.

Concrete example. Let's say we notice that Jim has just pushed the turn signal lever on the side of his steering wheel. Why did Jim do this?

The goal-directed-behavior story is as follows:

Jim pushed the turn signal lever because he wanted to

keltan

On April Fools the LW team released an album under the name of the Fooming Shoggoths. Ever since the amount that I think about rationality has skyrocketed.

That's because I've been listening exclusively to it when I'd usually be listening to other music. Especially Thought That Faster (feat. Eliezer Yudkowsky). I now find that when I come to a problem's conclusion I often do look back and think, "how could I have thought that faster?"

So, I've started attempting to add to the rationalist musical cannon using Udio. Here are two attempts I think turned out well. I especially like the first one.

When I hear phrases from a song in everyday life I complete the pattern, for example.

I feel like...

(See More – 134 more words)

quila's Shortform

quila

4mo

quila2h10

(I appreciate object-level engagement in general, but this seems combatively worded.)

The rest of this reply responds to arguments.

Why should the Earth superintelligence care about you, but not about the other 10^10^30 other causally independent ASIs that are latent in the hypothesis space, each capable of running enormous numbers of copies of the Earth ASI in various scenarios?

The example talks of a single ASI as a toy scenario to introduce the central idea.
- The reader can extrapolate that one ASI's actions won't be relevant if other ASIs create a greater n

... (read more)

1quila11h

'Value Capture' - An anthropic attack against some possible formally aligned ASIs (this is a more specific case of anthropic capture attacks in general, aimed at causing a formally aligned superintelligence to become uncertain about its value function (or output policy more generally)) Imagine you're a superintelligence somewhere in the world that's unreachable to life on Earth, and you have a complete simulation of Earth. You see a group of alignment researchers about to successfully create a formal-value-aligned ASI, and its design looks broadly like this: It has two relevant high-level components: (1) a hard-coded value function, (2) a (truly superintelligent) 'intelligence core' which searches for an output that maximizes the value function, and then outputs it. As the far-away unaligned ASI, here's something you might be able to do to make the intelligence core search for an output that instead maximizes your own value function, depending on the specifics of how the intelligence core works. * Given the intelligence core is truly superintelligent, it knows you're predicting its existence, and knows what you will do. * You create simulated copies of the intelligence core, but hook them up to a value function of your design. The number of copies you create just needs to be more than the amount which will be run on Earth. * Then, modify the simulations such that algorithms inside of the simulated intelligence cores are misled into believing the value function they are set to maximize is the same function the one on Earth is set to maximize, rather than the one you gave them. * Now your copies are in the same epistemic state as the intelligence core on Earth, both aware that you have done this and unable to distinguish which value function they are to maximize. * Because you created more copies, the highest expected value for such an intelligence core comes from acting as if they are one of the copies. * Because the copies and the original are in

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Breaking It Down

Day 7: Aversion Factoring

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA