LessWrong

136

21h

This is a linkpost for https://ericneyman.wordpress.com/2024/05/04/my-hour-of-memoryless-lucidity/

Yesterday, I had a coronectomy: the top halves of my bottom wisdom teeth were surgically removed. It was my first time being sedated, and I didn’t know what to expect. While I was unconscious during the surgery, the hour after surgery turned out to be a fascinating experience, because I was completely lucid but had almost zero short-term memory.

My girlfriend, who had kindly agreed to accompany me to the surgery, was with me during that hour. And so — apparently against the advice of the nurses — I spent that whole hour talking to her and asking her questions.

The biggest reason I find my experience fascinating is that it has mostly answered a question that I’ve had about myself for quite a long time: how deterministic am...

(Continue Reading – 1467 more words)

Viliam5m20

It could be an interesting experiment to build up this list iteratively. Like, every question you ask for the third time, the answer gets added at the bottom of the list. How long will the list get, and what will it contain?

2johnswentworth31m

My answer.

3Lucie Philippon2h

I consumed edible cannabis for the first time a few months ago, and it felt very similar to the experience you're describing. I felt regularly surprised at where I was, and had lots of trouble remembering more than the last 30 seconds of the conversation. The most troubling experience was listening to someone telling me something, me replying, and while saying the reply, forgetting where I was, what I was replying to and what I already said. The weirdest part is that at this point I would finish the reply in a sort of disconnected state, not knowing where the words were coming from, and at the end I would have a feeling of "I said what I wanted to say", even though I could not remember a word of it.

1Johannes C. Mayer6h

Maybe you need to think the thought many times over in order to overwrite the original memory. In your place, I would try to prepare something similar to what Drake did. Some mental objects that you can retrieve have a predesigned hole to put information. To me, it seems like this should not be that hard to get. Then for ideally 30 minutes or so (though the streaming algorithm experiment seems also very interesting) after the surgery when you don't have short-term memory, you can repeatedly try to insert some specific object in the memory. Maybe it would make sense for the sake of the experiment to limit yourself to 3 possible objects that could be inserted. Your girlfriend can then choose one randomly after surgery, for you to drill into the memory, by repeatedly thinking about the scene completed with that specific object. Then after the 30 minutes, you do something completely different. Then 1 hour afterwards your girlfriend can ask you what the object was that she told you 1 hour ago. Well and probably many times during the first 30 minutes. Probably it would be best if your girlfriend (or whatever person is willing to do this) constantly reminds you during the first 30 minutes or so that you need to imagine the object. Probably at least every minute or so.

Q&A on Proposed SB 1047

Zvi

Previously: On the Proposed California SB 1047.

Text of the bill is here. It focuses on safety requirements for highly capable AI models.

This is written as an FAQ, tackling all questions or points I saw raised.

Safe & Secure AI Innovation Act also has a description page.

Why Are We Here Again?

There have been many highly vocal and forceful objections to SB 1047 this week, in reaction to a (disputed and seemingly incorrect) claim that the bill has been ‘fast tracked.’

The bill continues to have substantial chance of becoming law according to Manifold, where the market has not moved on recent events. The bill has been referred to two policy committees one of which put out this 38 page analysis.

The purpose of this post is to gather and analyze all...

(Continue Reading – 12889 more words)

Jiro8m20

If your model is not projected to be at least 2024 state of the art and it is not over the 10^26 flops limit?

It's not going to be 2024 forever. In the future being 2024 state of the art won't be as hard as it is in actual 2024.

That developers risk going to jail for making a mistake on a form.

This (almost) never happens.

Because prosecuting someone for making a mistake on a form happens when the government wants to go after an otherwise innocent person for unacceptable reasons, so they prosecute a crime that goes unprosecuted 99% of the time.

The

johnswentworth

34m

A couple years ago, I had a great conversation at a research retreat about the cool things we could do if only we had safe, reliable amnesic drugs - i.e. drugs which would allow us to act more-or-less normally for some time, but not remember it at all later on. And then nothing came of that conversation, because as far as any of us knew such drugs were science fiction.

… so yesterday when I read Eric Neyman’s fun post My hour of memoryless lucidity, I was pretty surprised to learn that what sounded like a pretty ideal amnesic drug was used in routine surgery. A little googling suggested that the drug was probably a benzodiazepine (think valium). Which means it’s not only a great amnesic, it’s also apparently one...

(See More – 589 more words)

johnswentworth21m20

Another class of applications which we discussed at the retreat: person 1 takes the amnesic, person 2 shares private information on them, and then person gives their reaction to the private information. Can be used e.g. for complex negotiations: maybe it is in our mutual best interest to make some deal, but in order for me to know that I'd need some information which you don't want to share with me, so I take the drug, you share the information, and I record some verified record of myself saying "dear future self, you should in fact take this deal".

... which is cool in theory but I would guess not of high immediate value in practice, which is why the post didn't focus on it.

7Algon23m

Important notice: benzodiazepines are serious business: benzo withdrawals are amongst the worst experiences a human can go through, and combinations of benzos with alcohol, barbiturates, opioids or tricyclic antidepressants are very dangerous: benzos played a role in 31% of the estimated 22,767 deaths from prescription drug overdose in the United States. If you're experimenting with benzos, please be very careful!

1metachirality26m

You can actually use this to do the sleeping beauty experiment IRL and thereby test SIA vs SSA. Unfortunately you can only get results if you're the one being put under.

If you are assuming Software works well you are dead

Johannes C. Mayer

10h

I say this because I can hardly use a computer without constantly getting distracted. Even when I actively try to ignore how bad software is, the suggestions keep coming.

Seriously Obsidian? You could not come up with a system where links to headings can't break? This makes you wonder what is wrong with humanity. But then I remember that humanity is building a god without knowing what they will want.

So for those of you who need to hear this: I feel you. It could be so much better. But right now, can we really afford to make the ultimate <programming language/text editor/window manager/file system/virtual collaborative environment/interface to GPT/...>?

Can we really afford to do this while our god software looks like...

May this find you well.

4faul_sname3h

Haskell is a beautiful language, but in my admittedly limited experience it's been quite hard to reason about memory usage in deployed software (which is important because programs run on physical hardware. No matter how beautiful your abstract machine, you will run into issues where the assumptions that abstraction makes don't match reality). That's not to say more robust programming languages aren't possible. IMO rust is quite nice, and easily interoperable with a lot of existing code, which is probably a major factor in why it's seeing much higher adoption. But to echo and build off what @ustice said earlier: The hard part of programming isn't writing a program that transforms simple inputs with fully known properties into simple outputs that are meet some known requirement. The hard parts are finding or creating a mostly-non-leaky abstraction that maps well onto your inputs, and determining what precise machine-interpretable rules produce outputs that look like the ones you want. Most bugs I've seen come at the boundaries of the system, where it turns out that one of your assumptions about your inputs was wrong, or that one of your assumptions about how your outputs will be used was wrong. I almost never see bugs like this * My sort(list, comparison_fn) function fails to correctly sort the list" * My graph traversal algorithm skips nodes it should have hit * My pick_winning_poker_hand() function doesn't always recognize straights Instead, I usually see stuff like * My program assumes that when the server receives an order_received webhook, and then hits the server to fetch the order details from the vendor's API for the order identified in the webhook payload, the vendor's API will return the order details and not a 404 not found" * My server returns nothing at all when fetching the user's bill for this month, because while the logic is correct (determine the amount due for each order and sum), this particular user had 350,000 individual orders this

1Nevin Wetherill5h

I have been contemplating Connor Leahy's Cyborgism and what it would mean for us to improve human workflows enough that aligning AGI looks less like: Sisyphus attempting to roll a 20 tonne version of The One Ring To Rule Them All into the caldera of Mordor while blindfolded and occasionally having to bypass vertical slopes made out of impossibility proofs that have been discussed by only 3 total mathematicians ever in the history of our species - all before Sauron destroys the world after waking up from a restless nap of an unknown length. I think this is what you meant by "make the ultimate <programming language/text editor/window manager/file system/virtual collaborative environment/interface to GPT/...>" Intuitively, the level I'm picturing is: A suite of tools that can be booted up from a single icon on the home screen of a computer which then allows anyone who has decent taste in software to create essentially any program they can imagine up to a level of polish that people can't poke holes in even if you give a million reviewers 10 years of free time. Can something at this level be accomplished? Well, what does coding look like currently? It seems to look like a bunch of people with dark circles under their eyes reading long strings of characters in something basically the equivalent to an advanced text editor, with a bunch of additional little windows of libraries and graphics and tools. This is not as domain where human intelligence performs with as much ease as in other domains like spearfishing or bushcraft. If you want to build Cyborgs, I am pretty sure where you start is by focusing on building software that isn't god-machines, throwing out the old book of tacit knowledge, and starting over with something that makes each step as intuitive as possible. You probably also focus way more on quality over quantity/speed. So, plaintext instructions on what kind of software you want to build, or a code repository and a plaintext list of modifications? L

8Dagon5h

Agreed, but it's not just software. It's every complex system, anything which requires detailed coordination of more than a few dozen humans and has efficiency pressure put upon it. Software is the clearest example, because there's so much of it and it feels like it should be easy.

Viliam23m20

Consider the pressures and incentives. Adding new features can help you sell the software to more users. Fixing bugs... unless the application is practically falling apart, it does not make much of a difference. After all, the bugs will only get noticed by people who already use your application, i.e. they already paid for it.

For the artificial intelligence, I assume the "killer app" will be its integration with SharePoint.

Shortform

7lc16h

I seriously doubt on priors that Boeing corporate is murdering employees.

metachirality29m10

This sort of begs the question of why we don't observe other companies assassinating whistleblowers.

21a3orn10h

I mean, sure, but I've been updating in that direction a weirdly large amount.

Introducing AI-Powered Audiobooks of Rational Fiction Classics

Askwho

(ElevenLabs reading of this post:)

I'm excited to share a project I've been working on that I think many in the Lesswrong community will appreciate - converting some rational fiction into high-quality audiobooks using cutting-edge AI voice technology from ElevenLabs, under the name "Askwho Casts AI".

The keystone of this project is an audiobook version of Planecrash (AKA Project Lawful), the epic glowfic authored by Eliezer Yudkowsky and Lintamande. Given the scope and scale of this work, with its large cast of characters, I'm using ElevenLabs to give each character their own distinct voice. It's a labor of love to convert this audiobook version of this story, and I hope if anyone has bounced off it before, this...

(See More – 140 more words)

habryka31m20

Added an embedded audio element for you.

3Neel Nanda41m

Thanks for making these! How expensive is it?

6spencerb1h

Is there a way to access this without a substack subscription?

1JanPro1h

Thank you! The Planecrash audiobook is great, and I would not have read it if it were not for the audio version.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Thomas Kwa's Shortform

Thomas Kwa

Ω 04y

niplav34m20

Because^[1] for a Bayesian reasoner, there is conversation of expected evidence.

Although I've seen it mentioned that technically the change in the belief on a Bayesian should follow a Martingale, and Brownian motion is a martingale.

I'm not super technically strong on this particular part of the math. Intuitively it could be that in a bounded reasoner which can only evaluate programs in $P$ , any pattern in its beliefs that can be described by an algorithm in $P$ is detected and the predicted future belief from that pattern is incorporated into current belief

... (read more)

4Dagon6h

I think this leans a lot on "get evidence uniformly over the next 10 years" and "Brownian motion in 1% steps". By conservation of expected evidence, I can't predict the mean direction of future evidence, but I can have some probabilities over distributions which add up to 0. For long-term aggregate predictions of event-or-not (those which will be resolved at least a few years away, with many causal paths possible), the most likely updates are a steady reduction as the resolution date gets closer, AND random fairly large positive updates as we learn of things which make the event more likely.

30LawrenceC7h

The general version of this statement is something like: if your beliefs satisfy the law of total expectation, the variance of the whole process should equal the variance of all the increments involved in the process.[1] In the case of the random walk where at each step, your beliefs go up or down by 1% starting from 50% until you hit 100% or 0% -- the variance of each increment is 0.01^2 = 0.0001, and the variance of the entire process is 0.5^2 = 0.25, hence you need 0.25/0.0001 = 2500 steps in expectation. If your beliefs have probability p of going up or down by 1% at each step, and 1-p of staying the same, the variance is reduced by a factor of p, and so you need 2500/p steps. (Indeed, something like this standard way to derive the expected steps before a random walk hits an absorbing barrier). Similarly, you get that if you start at 20% or 80%, you need 1600 steps in expectation, and if you start at 1% or 99%, you'll need 99 steps in expectation. ---------------------------------------- One problem with your reasoning above is that as the 1%/99% shows, needing 99 steps in expectation does not mean you will take 99 steps with high probability -- in this case, there's a 50% chance you need only one update before you're certain (!), there's just a tail of very long sequences. In general, the expected value of variables need not look like I also think you're underrating how much the math changes when your beliefs do not come in the form of uniform updates. In the most extreme case, suppose your current 50% doom number comes from imagining that doom is uniformly distributed over the next 10 years, and zero after -- then the median update size per week is only 0.5/520 ~= 0.096%/week, and the expected number of weeks with a >1% update is 0.5 (it only happens when you observe doom). Even if we buy a time-invariant random walk model of belief updating, as the expected size of your updates get larger, you also expect there to be quadratically fewer of them -- e.

2niplav37m

Thank you a lot for this. I think this or @Thomas Kwas comment would make an excellent original-sequences-style post—it doesn't need to be long, but just going through an example and talking about the assumptions would be really valuable for applied rationality. After all, it's about how much one should expect ones beliefs to vary, which is pretty important.

Counting arguments provide no evidence for AI doom

Nora Belrose, Quintin Pope

Ω 262mo

Crossposted from the AI Optimists blog.

AI doom scenarios often suppose that future AIs will engage in scheming— planning to escape, gain power, and pursue ulterior motives, while deceiving us into thinking they are aligned with our interests. The worry is that if a schemer escapes, it may seek world domination to ensure humans do not interfere with its plans, whatever they may be.

In this essay, we debunk the counting argument— a central reason to think AIs might become schemers, according to a recent report by AI safety researcher Joe Carlsmith.^[1] It’s premised on the idea that schemers can have “a wide variety of goals,” while the motivations of a non-schemer must be benign by definition. Since there are “more” possible schemers than non-schemers, the argument goes, we should...

(Continue Reading – 3911 more words)

AviS37m10

I agree that, overall, counting arguments are weak.

But even if you expect SGD to be used for TAI, generalisation is not a good counterexample, because maybe most counting arguments about SGd do work except for generalisation (which would not be surprising, because we selected SGD precisely because it generalises well).

Key takeaways from our EA and alignment research surveys

Cameron Berg, Judd Rosenblatt, florin_pop, AE Studio

Many thanks to Spencer Greenberg, Lucius Caviola, Josh Lewis, John Bargh, Ben Pace, Diogo de Lucena, and Philip Gubbins for their valuable ideas and feedback at each stage of this project—as well as the ~375 EAs + alignment researchers who provided the data that made this project possible.

Background

Last month, AE Studio launched two surveys: one for alignment researchers, and another for the broader EA community.

We got some surprisingly interesting results, and we're excited to share them here.

We set out to better explore and compare various population-level dynamics within and across both groups. We examined everything from demographics and personality traits to community views on specific EA/alignment-related topics. We took on this project because it seemed to be largely unexplored and rife with potentially-very-high-value insights. In this post, we’ll present what...

(Continue Reading – 6237 more words)

Ryan Kidd1h10

Thank you so much for conducting this survey! I want to share some information on behalf of MATS:

In comparison to the AIS survey gender ratio of 9 M:F, MATS Winter 2023-24 scholars and mentors were 4 M:F and 12 M:F, respectively. Our Winter 2023-24 applicants were 4.6 M:F, whereas our Summer 2024 applicants were 2.6 M:F, closer to the EA survey ratio of 2 M:F. This data seems to indicate a large recent change in gender ratios of people entering the AIS field. Did you find that your AIS survey respondents with more AIS experience were significantly more mal

... (read more)

3Josh Jacobson17h

Epistemic status: just speculation, from a not very concrete memory, written hastily on mobile after a quick skim of the post. ---------------------------------------- My guess is that these results should be taken with a large grain of salt, but if I'm wrong, I'd be interested in hearing more about why. Specifically, I think the "alignment researcher" population and "org leader" populations here are probably a far departure from what people envision when they hear these terms. I also expect other populations reported on to have a directionally similar skew to what I speculate below. An anecdote for why I expect that (some aspects may be off): * I started the survey, based off the description that it'd be decently short. I found it long, involved, and asking various questions (marked as required) that I really wasn't interested in answering (nor interested in the results of). IIRC it also had various ways in which the question phrasing was lacking. I accordingly abandoned it, while seeing there was still a long way to go to completion. One additional factor for my abandoning it was that I couldn't imagine it drawing a useful response population anyway; the sample mentioned above is a significant surprise to me (even with my skepticism around the makeup of that population). Beyond the reasons I already described, I felt that it being done by a for-profit org that is a newcomer and probably largely unknown would dissuade a lot of people from responding (and/or providing fully candid answers to some questions). All in all, I expect that the respondent population skews heavily toward those who place a lower value on their time and are less involved. I expect this to generally be a more junior group, often not fully employed in these roles, with eg the average age and funding level of the orgs that are being led particularly low (and some of the orgs being more informal). That's a very legitimate and useful population to survey; I just think it also isn't at all

3Cameron Berg7h

Here is the full list of the alignment orgs who had at least one researcher complete the survey (and who also elected to share what org they are working for): OpenAI, Meta, Anthropic, FHI, CMU, Redwood Research, Dalhousie University, AI Safety Camp, Astera Institute, Atlas Computing Institute, Model Evaluation and Threat Research (METR, formerly ARC Evals), Apart Research, Astra Fellowship, AI Standards Lab, Confirm Solutions Inc., PAISRI, MATS, FOCAL, EffiSciences, FAR AI, aintelope, Constellation, Causal Incentives Working Group, Formalizing Boundaries, AISC. ~80% of the alignment sample is currently receiving funding of some form to pursue their work, and ~75% have been doing this work for >1 year. Seems to me like this is basically the population we were intending to sample. Your expectation while taking the survey about whether we were going to be able to get a good sample does not say much about whether we did end up getting a good sample. Things that better tell us whether or not we got a good sample are, eg, the quality/distribution of the represented orgs and the quantity of actively-funded technical alignment researchers (both described above). Note that the survey took people ~15 minutes to complete and resulted in a $40 donation being made to a high-impact organization, which puts our valuation of an hour of their time at ~$160 (roughly equivalent to the hourly rate of someone who makes ~$330k annually). Assuming this population would generally donate a portion of their income to high-impact charities/organizations by default, taking the survey actually seems to probably have been worth everyone's time in terms of EV.

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Why Are We Here Again?

Background

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA