All of PeterMcCluskey's Comments + Replies

I had nitrous oxide once at a dentist. It is a dissociative anesthetic. It may have caused something like selective amnesia. I remember that the dentist was drilling, but I have no clear memory of pain associated with it. It's a bit hard to evaluate exactly what it does, but it definitely has some benefits. Maybe the pain seemed too distant from me to be worth my attention?

A much higher fraction of the benefits of prediction markets are public goods.

Most forms of insurance did took a good deal of time and effort before they were widely accepted. It's unclear whether there's a dramatic difference in the rate of adoption of prediction markets compared to insurance.

I'm reaffirming my relatively extensive review of this post.

The simbox idea seems like a valuable guide for safely testing AIs, even if the rest of the post turns out to be wrong.

Here's my too-terse summary of the post's most important (and more controversial) proposal: have the AI grow up in an artificial society, learning self-empowerment and learning to model other agents. Use something like retargeting the search to convert the AI's goals from self-empowerment to empowering other agents.

I'm reaffirming my relatively long review of Drexler's full QNR paper.

Drexler's QNR proposal seems like it would, if implemented, guide AI toward more comprehensible systems. It might modestly speed up capabilities advances, while being somewhat more effective at making alignment easier.

Alas, the full paper is long, and not an easy read. I don't think I've managed to summarize its strengths well enough to persuade many people to read it.

This post didn't feel particularly important when I first read it.

Yet I notice that I've been acting on the post's advice since reading it. E.g. being more optimistic about drug companies that measure a wide variety of biomarkers.

I wasn't consciously doing that because I updated due to the post. I'm unsure to what extent the post changed me via subconscious influence, versus deriving the ideas independently.

Exchanges require more capital to move the price closer to the extremes than to move it closer to 50%.

This post is one of the best available explanations of what has been wrong with the approach used by Eliezer and people associated with him.

I had a pretty favorable recollection of the post from when I first read it. Rereading it convinced me that I still managed to underestimate it.

In my first pass at reviewing posts from 2022, I had some trouble deciding which post best explained shard theory. Now that I've reread this post during my second pass, I've decided this is the most important shard theory post. Not because it explains shard theory best, but bec... (read more)

Oops. I misread which questions you were comparing.

Now that I've read the full questions in the actual paper, it looks like some of the difference is due to "within 100 years" versus at any time horizon.

I consider it far-fetched that much of the risk is over 100 years away, but it's logically possible, and Robin Hanson might endorse a similar response.

I don't quite see this logical contradiction that your Twitter poll asks about.

I wouldn't be surprised if the answers reflect framing effects. But the answers seem logically consistent if we assume that some people believe that severe disempowerment is good.

[This comment is no longer endorsed by its author]Reply
4PeterMcCluskey2mo
Oops. I misread which questions you were comparing. Now that I've read the full questions in the actual paper, it looks like some of the difference is due to "within 100 years" versus at any time horizon. I consider it far-fetched that much of the risk is over 100 years away, but it's logically possible, and Robin Hanson might endorse a similar response.

The Fed can stimulate nominal demand at the ZLB. But (outside of times when it's correcting the results of overly tight monetary conditions) that means mostly more inflation, and has strongly diminishing returns on increased real consumption.

Eventually the economy would reach a new equilibrium (which presumably would contain the same amount of private consumption as the old equilibrium).

I expect less consumption in the new equilibrium.

The Fed has limited power to affect real demand. Fed stimulus is only helpful if there's unemployment due to something like deflation.

2Logan Zoellner2mo
I predict step 3 causes a lot of unemployment. We also seem to have different opinions about whether the ZLB is a real thing.  Even at the ZLB I think the Fed can still stimulate demand with QE.

I realize now that some of this post was influenced by a post that I'd forgotten reading: Causal confusion as an argument against the scaling hypothesis, which does a better job of explaining what I meant by causal modeling being hard.

I agree there's something strange about Loyal's strategy.

But it's not like all aging researchers act like they back Loyal's approach. Intervene Immune has been getting good biomarker results in human trials by taking nearly the opposite approach: raising IGF-1 levels for a while.

I wrote a longer discussion about IGF-1 and aging in my review of Morgan Levine's book True Age.

If someone comes into the hospital

That's a bad criterion to use.

See Robin Hanson's Buy Health proposal for a better option.

Is this the post you're looking for?

I've got a Mercedes with an Active Blind Spot Assist that eliminates the need to worry about this.

I understand how we can avoid trusting an AI if we've got a specification that the proof checker understands.

Where I expect to need an AI is for generating the right specifications.

Note that effectively we are saying to trust the neural network

I expect that we're going to have to rely on some neural networks regardless of how we approach AI. This paper guides us to be more strategic about what reliance to put on which neural networks.

4Steve_Omohundro5mo
Fortunately, for coarse "guardrails" the specs are pretty simple and can often be reused in many contexts. For example, all software we want to run should have proofs that: 1) there aren't memory leaks, 2) there aren't out-of-bounds memory accesses, 3) there aren't race conditions, 4) there aren't type violations, 5) there aren't buffer overflows, 6) private information is not exposed by the program, 7) there aren't infinite loops, etc. There should be a widely used "metaspec" for those criteria which most program synthesis AI will have to prove their generated code satisfies. Similarly, there are common constraints for many physical systems: eg. robots, cars, planes, boats, etc. shouldn't crash into things or harm humans, etc. The more refined the rules are, the more subtle they become. To prevent existentially bad outcomes, I believe coarse constraints suffice. But certainly we eventually want much more refined models of the world and of the outcomes we seek. I'm a fan of "Digital Twins" of physical systems which allow rules and constraints to be run in simulation which can help in choosing specifications. We certainly want those simulations to be trusted. which can be achieved by proving the code actually simulates the systems it claims to. Eventually it would be great to have fully trusted AI as well! Mechanistic Interpretability should be great for that! I'm just reading Anthropic's recent nice advances in that. If that continues to make progress then it makes our lives much easier but it doesn't eliminate the need to ensure that misaligned AGI and malicious AGI don't cause harm. The big win with the proof checking and the cryptographic hardware we propose is that we can ensure that even powerful systems will obey rules that humanity selects. If we don't implement that kind of system (or something functionally equivalent), then there will be dangerous pathways which malicious AGI can exploit to cause great harm to humans. 
2Mikola Lysenko5mo
That's a great link, thanks! Though it doesn't really address the point I made, they do briefly mention it: > Interestingly, diamond has the highest known oxidative chemical storage density because it has the highest atom number (and bond) density per unit volume. Organic materials store less energy per unit volume, from ~3 times less than diamond for cholesterol, to ~5 times less for vegetable protein, to ~10–12 times less for amino acids and wood ... >  > Since replibots must build energy-rich product structures (e.g. diamondoid) by consuming relatively energy-poor feedstock structures (e.g., biomass), it may not be possible for biosphere conversion to proceed entirely to completion (e.g., all carbon atoms incorporated into nanorobots) using chemical energy alone, even taking into account the possible energy value of the decarbonified sludge byproduct, though such unused carbon may enter the atmosphere as CO2 and will still be lost to the biosphere. Unfortunately they never bother to follow up on this with the rest of their calculations, and instead base their estimate for replication times on how long it takes the nanobots to eat up all the available atoms.  However, in my estimation the bottleneck on nanobot replication is not getting materials, but probably storing up enough joules to overcome the gibbs free energy of assembling another diamondoid nanobot from spare parts.  I would love to have a better picture for this estimate since it seems like the determining factor in whether this stuff can actually proceed exponentially or not.

I initially dismissed Orthogonal due to a guess that their worldview was too similar to MIRI's, and that they would give up or reach a dead end for reasons similar to why MIRI hasn't made much progress.

Then the gears to ascension prodded me to take a closer look.

Now that I've read their more important posts, I'm more confused.

I still think Orthogonal has a pretty low chance of making a difference, but there's enough that's unique about their ideas to be worth pursuing. I've donated $15k to Orthogonal.

Eliminating the profit motive would likely mean that militaries develop dangerous AI a few years later.

I'm guessing that most people's main reason is that it looks easier to ban AI research than to sufficiently reduce the profit motive.

The belief in a universal, independent standard for altruism, morality, and right and wrong is deeply ingrained in societal norms.

That's true of the norms in WEIRD cultures. It is far from universal.

4Mitchell_Porter6mo
Posts from this account appear to be AI-generated.  Another such account is @super-agi, but whoever is behind that one, does actually interact with comments. We shall see if @George360 is capable of that. 

I expect such acausal collaboration to be harder to develop than good calibration, and therefore less likely to happen at the stage I have in mind.

1[anonymous]6mo
I think it would be good if you're right. I'm curious why you believe this. (Feel free to link other posts/comments discussing this, if there are any)

the people choosing this many white cars seem low-level insane

The increase in white cars seems to follow a 2007 study An Investigation into the Relationship between Vehicle Colour and Crash Risk which says light-colored cars are safer. Maybe it's just a coincidence.

2localdeity7mo
The reason I prefer a white car is that it absorbs less heat via sunlight.  A source says "Studies have shown the difference in temperature between a white car and a black car left in the sun can be as much as 5-6 degrees after just one hour."

Thank you for narrowing my confusion over what AI_0 does.

My top question now is: how long does AI_0 need to run, and why is it safe from other AIs during that period?

AI_0 appears to need a nontrivial fraction of our future lightcone to produce a decent approximation of the intended output. Yet keeping it boxed seems to leave the world vulnerable to other AIs.

I disagree. The macro environment is good enough that the Fed could easily handle any contraction, provided they focus on forward looking indicators, such as the TIPS spread, or near-realtime indicators such as the ISM purchasing manager numbers.

Now seems like a good time for the Fed to start decreasing interest rates.

1Glenn Clayton7mo
I see. Yeah, I don't disagree that inflation is better, but it is certainly not a non-issue. Imagine what happens if the Fed dropped interest rates (rather than simply pausing them at the current rate). The point I was making relative to inflation is that the traditional playbook for responding to a contraction is difficult to picture given the macro environment. My guess is that even Kevin Erdmann would agree with that.

This is less than half correct.

There's still a widespread labor shortage. A slowdown might mean significant unemployment in Silicon Valley, but it will mean a return to normal in most places.

Inflation is back to normal. It only looks high to people who are focused on lagging indicators such as the CPI.

1Glenn Clayton8mo
I'd love to hear what specifically you disagree with. I don't know of anyone who believes that inflation is back to normal. Can you cite anything to back that up? Also, I'd love to see any data supporting your contention that there is a wide spread labor shortage among the subset of the labor market I'm addressing. As an active VC, I haven't seen evidence of that at all.

Most of the problem with the reference ranges is that they are usually just intended to reflect what 95% of the reference population will have. That's much easier to measure than the range which indicates good health.

There isn't much incentive for any authority to establish guidelines for healthy ranges. So too many people end up equating "normal" results with good results, because normal is what gets quantified, and is usually what is reported on test results.

1AlexFromSafeTransition7mo
Thank you! I have read it and made a lot of updates. For example, I renamed the concept to a simbox and I added an idea for a religion for the AIs and how to make them believe it. In the "A great backstory that maximizes the odds of success" section. 

I see hints that a fair amount of value might hiding in this post. Here's an attempt at rewriting the parts of this post that I think I understand, with my own opinions shown in {braces}. I likely changed a good deal of the emphasis to reflect my worldview. I presume my comments will reveal some combination of my confused mangling of your ideas, and your cryptic communication style. I erred on the side of rewriting it from scratch to reduce the risk that I copy your text without understanding it. I'm posting a partial version of it in order to get feedback... (read more)

Oops, you're right. Section 36.6 does advocate modularity, in a way that hints at the vibe you describe. And my review of the CAIS paper did say things about modularity that seem less likely now than they did 4 years ago.

I agree that people have gotten vibes from the paper which have been somewhat discredited.

Yet I don't see how that vibe followed from what he wrote. He tried to clarify that having systems with specialized goals does not imply they have only narrow knowledge. See section 21 of the CAIS paper ("Broad world knowledge can support safe task performance").

Are people collapsing "AI with narrow goals" and "AI with only specialized knowledge" into one concept "narrow AI"?

3Matthew Barnett8mo
What is the vibe you're interpreting me as stating? I didn't mean that Drexler said that "systems with specialized goals will have only narrow knowledge". What I wrote was that I interpreted the CAIS world as one where we'd train a model from scratch for each task. The update that I'm pointing out is that the costs of automating tasks can be massively parallelized across tasks, not that AIs will have broad knowledge of the world.

Verified safe software means the battle shifts to vulnerabilities in any human who has authority over the system.

3reallyeli9mo
This seems tougher for attackers because experimentation with specific humans is much costlier than experimentation with automated systems. (But I'm unsure of the overall dynamics in this world!)

What I don’t understand is, either in my model or Critch’s, where we find more hope by declining a pivotal act, once one becomes feasible?

Part of the reason for more hope is that people are more trustworthy if they commit to avoiding the worst forms of unilateralist curses and world conquest. So by having committed to avoiding the pivotal act, leading actors became more likely to cooperate in ways that avoided the need for a pivotal act.

If a single pivotal act becomes possible, then it seems likely that it will also be possible to find friendlier pivota... (read more)

Cheap printing was likely a nontrivial factor, but was influenced by much more than just the character sets. Printing presses weren't very reliable or affordable until a bunch of component technologies reached certain levels of sophistication. Even after they became practical, most cultures had limited interest in them.

1Alexander E.9mo
The other obstacles to printing could theoretically be overcome. Merchants and missionaries would have transferred western printing technologies across the globe given enough time. Character sets poses a far more fundamental problem; it may be the deciding factor why other complex civilizations was so slow to adopt printing and industrialization. Japan is one of the few successful examples of historical societies catching up to the west in technology. It too failed to adopt the printing press for a long time; a fact that was accompanied by stagnation. Japan's adoption of advanced printing technologies more suitable to oriental characters coincided with its meteoric economic rise during the Meiji restoration. Most regions of the world at the time were not sufficiently advanced to take full advantage of the education opportunities provided by printing. The printing press factor doesn't address why some regions of the world developed complex civilizations while other regions didn't. But it could perhaps explain why one specific civilization (Europe) advanced so much faster after the advent of printing, while other complex civilizations (who were arguably ahead of Europe at the time) stagnated.

Filtering out entire sites seems too broad and too crude to have much benefit.

I see plenty of room to turn this into a somewhat good proposal by having GPT-4 look through the dataset for a narrow set of topics. Something close to "how we will test AIs for deception".

2beren9mo
Yes, I think what I proposed here is the broadest and crudest thing that will work. It can of course be much more targeted to specific proposals or posts that we think are potentially most dangerous. Using existing language models to rank these is an interesting idea.

A good deal of this post is correct. But the goals of language models are more complex than you admit, and not fully specified by natural language. LLMs do something that's approximately a simulation of a human. Those simulated quasi-humans are likely to have quasi-human goals that are unstated and tricky to observe, for much the same reasons that humans have such goals.

LLMs also have goals that influence what kind of human they simulate. We'll know approximately what those goals are, due to our knowledge of what generated those goals. But how do we tell whether approximately is good enough?

No. I found a claim of good results here. Beyond that I'm relying on vague impressions from very indirect sources, plus fictional evidence such as the movie Latter Days.

Many rationalists do follow something resembling the book's advice.

CFAR started out with too much emphasis on lecturing people, but quickly noticed that wasn't working, and pivoted to more emphasis on listening to people and making them feel comfortable. This is somewhat hard to see if you only know the rationalist movement via its online presence.

Eliezer is far from being the world's best listener, and that likely contributed to some failures in promoting rationality. But he did attract and encourage people who overcame his shortcomings for CFAR's in-pers... (read more)

1bc4026bd4aaa5b7fe9mo
Fair enough, I haven't interacted with CFAR at all. And the "rationalists have failed" framing is admittedly partly bait to keep you reading, partly parroting/interpreting how Yudkowsky appears to see his efforts towards AI Safety, and partly me projecting my own AI anxieties out there. The Overton window around AI has also been shifting so quickly that this article may already be kind of outdated. (Although I think the core message is still strong.) Someone else in the comments pointed out the religious proselytization angle, and yeah, I hadn't thought about that, and apparently neither did David. That line was basically a throwaway joke lampshading how all the organizations discussed in the book are left-leaning, I don't endorse it very strongly.
2mukashi9mo
Any source you would recommend to know more about the specific practices of Mormons you are referring to?

Does the literature on the economics of reputation have ideas that are helpful?

I haven't thought this out very carefully. I'm imagining a transformer trained both to predict text, and to predict the next frame of video.

Train it on all available videos that show realistic human body language.

Then ask the transformer to rate on a numeric scale how positively or negatively a human would feel in any particular situation.

This does not seem sufficient for a safe result, but implies that LeCun is less nutty than your model of him suggests.

2Steven Byrnes10mo
I'm still confused. Here you're describing what you're hoping will happen at inference time. I'm asking how it's trained, such that that happens. If you have a next-frame video predictor, you can't ask it how a human would feel. You can't ask it anything at all - except "what might be the next frame of thus-and-such video?". Right? I wonder if you've gotten thrown off by chatGPT etc. Those are NOT trained by SSL, and therefore NOT indicative of how SSL-trained models behave. They're pre-trained by SSL, but then they're fine-tuned by supervised learning, RLHF, etc. The grizzled old LLM people will tell you about the behavior of pure-SSL models, which everyone used before like a year ago. They're quite different. You cannot just ask them a question and expect them to spit out an answer. You have to prompt them in more elaborate ways. (On a different topic, self-supervised pre-training before supervised fine-tuning is almost always better than supervised learning from random initialization, as far as I understand. Presumably if someone were following the OP protocol, which involves a supervised learning step, then they would follow all the modern best practices for supervised learning, and “start from a self-supervised-pretrained model” is part of those best practices.)

Why assume LeCun would use only supervised learning to create the IC module?

If I were trying to make this model work, I'd use mainly self-supervised learning that's aimed at getting the module to predict what a typical human would feel. (I'd also pray for a highly multipolar scenario if I were making this module immutable when deployed.)

2Steven Byrnes10mo
I don’t follow. Can you explain in more detail? “Self-supervised learning” means training a model to predict some function / subset of the input data from a different function / subset of the input data, right? What’s the input data here, and what is the prediction target?

Might this paradigm be tested by measuring LLM fluid intelligence?

I predict that a good test would show that current LLMs have modest amounts of fluid intelligence, and that LLM fluid intelligence will increase in ways that look closer to continuous improvement than to a binary transition from nothing to human-level.

I'm unclear whether it's realistic to get a good enough measure of fluid intelligence to resolve this apparent crux, but I'm eager to pursue any available empirical tests of AI risk.

Upvoted for clarifying a possibly important crux. I still have trouble seeing a coherent theory here.

I can see a binary difference between Turing-complete minds and lesser minds, but only if I focus on the infinite memory and implicitly infinite speed of a genuine Turing machine. But you've made it clear that's not what you mean.

When I try to apply that to actual minds, I see a wide range of abilities at general-purpose modeling of the world.

Some of the differences in what I think of as general intelligence are a function of resources, which implies a fair... (read more)

2Thane Ruthenis10mo
Hm, I think your objections are mostly similar to the objections cfoster0 is raising in this thread, so in lieu of repeating myself, I'll just link there. Do point out if I misunderstood and some of your points are left unaddressed.

In the hypothetical where there’s no general intelligence, there’s no such thing as “smarter”,

It sure looks like many species of animals can be usefully compared as smarter than others. The same is true of different versions of LLMs. Why shouldn't I conclude that most of those have what you call general intelligence?

If a hostile alien civilization notices us, we’re going to die. But if we’re going to die from the AGI anyway, who cares?

Anyone with a p(doom from AGI) < 99% should conclude that harm from this outweighs the likely benefits.

2[comment deleted]10mo
2RomanS10mo
Not sure about it. Depends on the proportion of alien civilizations that will cause more harm than good upon a contact with us. The proportion is unknown. A common argument is that an interstellar civilization must be sufficiently advanced in both tech and ethics. But i don't think the argument is very convincing.

I’m guessing something like a 0.1% success rate. I think this is sufficient for success if you have automated the process and can afford to run the process enough to generate and test millions of possibilities. This is a largely parallelizable process, so it doesn’t necessarily take much wall clock time.

How much compute would it take to test a million of these in parallel? I assume you're imagining something less compute-intensive than retraining a million GPTs from scratch, but I'm unclear how much less compute-intensive.

How much evidence does it need ... (read more)

2Nathan Helm-Burger10mo
I respond to you and Max in my other comment. https://www.lesswrong.com/posts/zwAHF5tmFDTDD6ZoY/will-gpt-5-be-able-to-self-improve?commentId=bB2ssvhEjjsPovuTh

If months of debate with superforecasters didn’t accomplish much, that’s really disheartening

I participated in Tetlock's tournament. Most people devoted a couple of hours to this particular topic, spread out over months.

A significant fraction of the disagreement was about whether AI would be transformative this century. I made a bit of progress on this, but didn't get enough feedback to do much. AFAICT, many superforecasters know that reports of AI progress were mostly hype in prior decades, and are assuming that is continuing unless they see strong evidence to the contrary. They're typically not willing to spend much more than an hour looking for such evidence.

Load More