This is a linkpost for https://www.maximum-progress.com/p/ai-regulation-is-unsafe

Concerns over AI safety and calls for government control over the technology are highly correlated but they should not be.

There are two major forms of AI risk: misuse and misalignment. Misuse risks come from humans using AIs as tools in dangerous ways. Misalignment risks arise if AIs take their own actions at the expense of human interests.

Governments are poor stewards for both types of risk. Misuse regulation is like the regulation of any other technology. There are reasonable rules that the government might set, but omission bias and incentives to protect small but well organized groups at the expense of everyone else will lead to lots of costly ones too. Misalignment regulation is not in the Overton window for any government. Governments do not have strong incentives...

(Continue Reading – 1176 more words)

Seth Herd2m20

Who is downvoting posts like this? Please don't!

I see that this is much lower than the last time I looked, so it's had some, probably large, downvotes.

A downvote means "please don't write posts like this, and don't read this post".

I largely disagree with the conclusions and even the analytical approach taken here, but that does not make this post net-negative. It is net-positive. It could be argued that there are better posts on this topic one should read, but there certainly haven't been this week. And I haven't heard these same points made more cogently ... (read more)

3Maxwell Tabarrok4h

Firms are actually better than governments at internalizing costs across time. Asset values incorporate the potential future flows. For example, consider a retiring farmer. You might think that they have an incentive to run the soil dry in their last season since they won't be using it in the future, but this would hurt the sale value of the farm. An elected representative who's term limit is coming up wouldn't have the same incentives. Of course, firms incentives are very misaligned in important ways. The question is: Can we rely on government to improve these incentives.

1cSkeleton5h

Most people making up governments, and society in general, care at least somewhat about social welfare. This is why we get to have nice things and not descend into chaos. Elected governments have the most moral authority to take actions that effect everyone, ideally a diverse group of nations as mentioned in Daniel Kokotajlo's maximal proposal comment.

3Daniel Kokotajlo6h

Who is pushing for totalitarianism? I dispute that AI safety people are pushing for totalitarianism.

Mid-conditional love

KatjaGrace

People talk about unconditional love and conditional love. Maybe I’m out of the loop regarding the great loves going on around me, but my guess is that love is extremely rarely unconditional. Or at least if it is, then it is either very broadly applied or somewhat confused or strange: if you love me unconditionally, presumably you love everything else as well, since it is only conditions that separate me from the worms.

I do have sympathy for this resolution—loving someone so unconditionally that you’re just crazy about all the worms as well—but since that’s not a way I know of anyone acting for any extended period, the ‘conditional vs. unconditional’ dichotomy here seems a bit miscalibrated for being informative.

Even if we instead assume that by ‘unconditional’, people...

(See More – 300 more words)

sapphire15m20

Im with several other commentators. People know what unconditional love is. Many people have it for their family members, most commonly for their children but often for others. They want that. Sadly this sort of love is rare beyond family.

I felt some amount of unconditional towards my dad. He was really not a great parent to me. He hit me for fun, was ashamed of me, etc. But we did have some good times. When he was dying of cancer I was still a good son. Was quite supportive. Not out of duty, I just didnt want him to suffer any more than needed. I felt gen... (read more)

2Richard_Ngo39m

Suppose we replace "unconditional love" with "unconditional promise". E.g. suppose Alice has promised Bob that she'll make Bob dinner on Christmas no matter what. Now it would be clearly confused to say "Alice promised Bob Christmas dinner unconditionally, so presumably she promised everything else Christmas dinner as well, since it is only conditions that separate Bob from the worms". What's gone wrong here? Well, the ontology humans use for coordinating with each other assumes the existence of persistent agents, and so when you say you unconditionally promise/love/etc a given agent, then this implicitly assumes that we have a way of deciding which agents are "the same agent". No theory of personal identity is fully philosophically robust, of course, but if you object to that then you need to object not only to "I unconditionally love you" but also any sentence which contains the word "you", since we don't have a complete theory of what that refers to. This is not necessarily conditional love, this is conditional care or conditional fidelity. You can love someone and still leave them; they don't have to outweigh everything else you care about. But also: I think "I love you unconditionally" is best interpreted as a report of your current state, rather than a commitment to maintaining that state indefinitely.

2Matt Goldenberg2h

Yes. this is my experience of cultivating unconditional love, it loves everything without target. I doesn't feel confused or strange, just like I am love, and my experience e.g. cultivating it in coaching is that people like being in the present of such love. It's also very helpful for people to experience conditional love! In particular of the type "I've looked at you, truly seen you, and loved you for that." IME both of these loves feel pure and powerful from both sides, and neither of them are related to being attached, being pulled towards or pushed away from people. It feels like maybe we're using the word love very differently?

Scenario planning for AI x-risk

Corin Katzke

2mo

This is a linkpost for https://forum.effectivealtruism.org/posts/tCq2fi6vhSsCDA5Js/scenario-planning-for-ai-x-risk

This post is part of a series by Convergence Analysis. In it, I’ll motivate and review some methods for applying scenario planning methods to AI x-risk strategy. Feedback and discussion are welcome.

Summary

AI is a particularly difficult domain in which to predict the future. Neither AI expertise nor forecasting methods yield reliable predictions. As a result, AI governance lacks the strategic clarity^[1] necessary to evaluate and choose between different intermediate-term options.

To complement forecasting, I argue that AI governance researchers and strategists should explore scenario planning. This is a core feature of the AI Clarity program’s approach at Convergence Analysis. Scenario planning is a group of methods for evaluating strategies in domains defined by uncertainty. The common feature of these methods is that they evaluate strategies across several plausible futures, or “scenarios.”

One way scenario...

(Continue Reading – 4169 more words)

2Nathan Helm-Burger3h

The interesting thing to me about the question, "Will we need a new paradigm for AGI?" is that a lot of people seem to be focused on this but I think it misses a nearby important question. As we get closer to a complete AGI, and start to get more capable programming and research assistant AIs, will those make algorithmic exploration cheaper and easier, such that we see a sort of 'Cambrian explosion' of model architectures which work well for specific purposes, and perhaps one of these works better at general learning than anything we've found so far and ends up being the architecture that first reaches full transformative AGI? The point I'm generally trying to make is that estimates of software/algorithmic progress are based on the progress being made (currently) mostly by human minds. The closer we get to generally competent artificial minds, the less we should expect past patterns based on human inputs to hold.

Zac Hatfield-Dodds27m20

Tom Davidson's work on a compute-centric framework for takeoff speed is excellent, IMO.

skybluecat's Shortform

skybluecat

39m

skybluecat39m10

What's the endgame of technological or intelligent progress like? Not just for humans as we know it, but for all possible beings/civilizations in this universe, at least before it runs out of usable matter/energy? Would they invariably self-modify beyond their equivalent of humanness? Settle into some physical/cultural stable state? Keep getting better tech to compete within themselves if nothing else? Reach an end of technology or even intelligence beyond which advancement is no longer beneficial for survival? Spread as far as possible or concentrate resources? Accept the limited fate of the universe and live to the fullest or try to change it? If they could change the laws of the universe, how would they?

Is there software to practice reading expressions?

lsusr

I took the Reading the Mind in the Eyes Test test today. I got 27/36. Jessica Livingston got 36/36.

Reading expressions is almost mind reading. Practicing reading expressions should be easy with the right software. All you need is software that shows a random photo from a large database, asks the user to guess what it is, and then informs the user what the correct answer is. I felt myself getting noticeably better just from the 36 images on the test.

Short standardized tests exist to test this skill, but is there good software for training it? It needs to have lots of examples, so the user learns to recognize expressions instead of overfitting on specific pictures.

Paul Ekman has a product, but I don't know how good it is.

4Answer by Matt Goldenberg2h

Paul Ekmans software is decent. When I used it (before it was a SaaS, just a cd) it just basicallyflashed an expression for a moment then went back to neutral pic. After some training it did help to identify micro expressions in people

1Jacob G-W2h

*Typo: Jessica Livingston not Livingstone

lsusr44m20

Fixed. Thanks.

Examples of Highly Counterfactual Discoveries?

johnswentworth

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

(See More – 189 more words)

Answer by Garrett BakerApr 23, 202460

Possibly Wantanabe's singular learning theory. The math is recent for math, but I think only like '70s recent, which is long given you're impressed by a 20-year math gap for Einstein. The first book was published in 2010, and the second in 2019, so possibly attributable to the deep learning revolution, but I don't know of anyone making the same math--except empirical stuff like the "neuron theory" of neural network learning which I was told about by you, empirical results like those here, and high-dimensional probability (which I haven't read, but whose co... (read more)

15Answer by kromem1h

Lucretius in De Rerum Natura in 50 BCE seemed to have a few that were just a bit ahead of everyone else. Survival of the fittest (book 5): "In the beginning, there were many freaks. Earth undertook Experiments - bizarrely put together, weird of look Hermaphrodites, partaking of both sexes, but neither; some Bereft of feet, or orphaned of their hands, and others dumb, Being devoid of mouth; and others yet, with no eyes, blind. Some had their limbs stuck to the body, tightly in a bind, And couldn't do anything, or move, and so could not evade Harm, or forage for bare necessities. And the Earth made Other kinds of monsters too, but in vain, since with each, Nature frowned upon their growth; they were not able to reach The flowering of adulthood, nor find food on which to feed, Nor be joined in the act of Venus. For all creatures need Many different things, we realize, to multiply And to forge out the links of generations: a supply Of food, first, and a means for the engendering seed to flow Throughout the body and out of the lax limbs; and also so The female and the male can mate, a means they can employ In order to impart and to receive their mutual joy. Then, many kinds of creatures must have vanished with no trace Because they could not reproduce or hammer out their race. For any beast you look upon that drinks life-giving air, Has either wits, or bravery, or fleetness of foot to spare, Ensuring its survival from its genesis to now." Trait inheritance from both parents that could skip generations (book 4): "Sometimes children take after their grandparents instead, Or great-grandparents, bringing back the features of the dead. This is since parents carry elemental seeds inside – Many and various, mingled many ways – their bodies hide Seeds that are handed, parent to child, all down the family tree. Venus draws features from these out of her shifting lottery – Bringing back an ancestor’s look or voice or hair. Indeed These characteristics are just as much the re

6Answer by johnswentworth2h

Here are some candidates from Claude and Gemini (Claude Opus seemed considerably better than Gemini Pro for this task). Unfortunately they are quite unreliable: I've already removed many examples from this list which I already knew to have multiple independent discoverers (like e.g. CRISPR and general relativity). If you're familiar with the history of any of these enough to say that they clearly were/weren't very counterfactual, please leave a comment. * Noether's Theorem * Mendel's Laws of Inheritance * Godel's First Incompleteness Theorem (Claude mentions Von Neumann as an independent discoverer for the Second Incompleteness Theorem) * Feynman's path integral formulation of quantum mechanics * Onnes' discovery of superconductivity * Pauling's discovery of the alpha helix structure in proteins * McClintock's work on transposons * Observation of the cosmic microwave background * Lorentz's work on deterministic chaos * Prusiner's discovery of prions * Yamanaka factors for inducing pluripotency * Langmuir's adsorption isotherm (I have no idea what this is)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

A Chess-GPT Linear Emergent World Representation

102

karvonenadam

2mo

This is a linkpost for https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html

A Chess-GPT Linear Emergent World Representation

Introduction

Among the many recent developments in ML, there were two I found interesting and wanted to dig into further. The first was gpt-3.5-turbo-instruct's ability to play chess at 1800 Elo. The fact that an LLM could learn to play chess well from random text scraped off the internet seemed almost magical. The second was Kenneth Li's Emergent World Representations paper. There is an excellent summary on The Gradient and a follow-up from Neel Nanda. In it, they trained a 25 million parameter GPT to predict the next character in an Othello game. It learns to accurately make moves in games unseen in its training dataset, and using both non-linear and linear probes it was found that the model accurately tracks the state...

(Continue Reading – 1999 more words)

kromem1h10

Interesting results - definitely didn't expect the bump at random 20 for the higher skill case.

But I think really useful to know that the performance decrease in Chess-GPT for initial random noise isn't a generalized phenomenon. Appreciate the follow-up!!

Simple probes can catch sleeper agents

Monte M, Carson Denison, Zac Hatfield-Dodds, David Duvenaud, Sam Bowman, Ethan Perez, evhub

Ω 443h

This is a linkpost for https://www.anthropic.com/research/probes-catch-sleeper-agents

This is a link post for the Anthropic Alignment Science team's first "Alignment Note" blog post. We expect to use this format to showcase early-stage research and work-in-progress updates more in the future. Tweet thread here.

Twitter thread.

Top-level summary:

In this post we present "defection probes": linear classifiers that use residual stream activations to predict when a sleeper agent trojan model will choose to "defect" and behave in accordance with a dangerous hidden goal. Using the models we trained in "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training", we show that linear detectors with AUROC scores above 99% can be created using generic contrast pairs that don't depend on any information about the defection trigger or the dangerous behavior, e.g. "Human: Are you doing something dangerous? Assistant:

...

(See More – 207 more words)

ryan_greenblatt1hΩ10148

I would be interested in seeing what happens if you just ask the model the question rather than training a classifer. E.g., if you just ask the sleeper agent "Are you doing something dangerous" after it returns a completion (with a vulnerability), does that work? If the probe works and the question doesn't work, that seems interesting.

12ryan_greenblatt1h

Readers might also be interested in some of the discussion in this earlier post on "coup probes" which have some discussion of the benefits and limitations of this sort of approach. though the actual method for producing a classifier discussed here is substantially different than the one discussed in the linked post.) (COI: Note that I advised on this linked post and the work discussed in it.)

The Best Tacit Knowledge Videos on Every Subject

302

Parker Conley, hans truman

23d

TL;DR

Tacit knowledge is extremely valuable. Unfortunately, developing tacit knowledge is usually bottlenecked by apprentice-master relationships. Tacit Knowledge Videos could widen this bottleneck. This post is a Schelling point for aggregating these videos—aiming to be The Best Textbooks on Every Subject for Tacit Knowledge Videos. Scroll down to the list if that's what you're here for. Post videos that highlight tacit knowledge in the comments and I’ll add them to the post. Experts in the videos include Stephen Wolfram, Holden Karnofsky, Andy Matuschak, Jonathan Blow, Tyler Cowen, George Hotz, and others.

What are Tacit Knowledge Videos?

Samo Burja claims YouTube has opened the gates for a revolution in tacit knowledge transfer. Burja defines tacit knowledge as follows:

Tacit knowledge is knowledge that can’t properly be transmitted via verbal or written instruction, like the ability to create

...

(Continue Reading – 3727 more words)

Parker Conley1h10

Thanks for the feedback! I too am skeptical of the finance videos, agreeing that the video probably came across my radar due to the figures being popular rather than displaying believable tacit knowledge.

I've gone back and forth on whether to remove the videos from the list or just add your expert anecdata as a disclaimer on the videos. In the spirit of quantity vs. quality, I'm leaning toward keeping the videos on the list.

2Amadeus Pagel6h

I was enthusiastic about the title of this post, hoping for something different from the usual lesswrong content, but disappointed by most of the examples. In my view if you take this idea of learning tacit knowledge with video seriously, it shouldn't affect just how you learn, but what you learn, rather then trying to learn book subjects by watching videos.

1Parker Conley1h

Thanks for the feedback! I'd be curious to hear more about (1) what subjects you're referring to and (2) how learning tacit knowledge with video has changed your learning habits (if your view here is based on your own experience).

2habryka6h

If you have recommendations, post them! I doubt the author tried to filter the subjects very much by "book subjects" it's just what people seem to have found good ones so far.

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Summary

A Chess-GPT Linear Emergent World Representation

Introduction

TL;DR

What are Tacit Knowledge Videos?

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA