Eliezer explores a dichotomy between "thinking in toolboxes" and "thinking in laws". 
Toolbox thinkers are oriented around a "big bag of tools that you adapt to your circumstances." Law thinkers are oriented around universal laws, which might or might not be useful tools, but which help us model the world and scope out problem-spaces. There seems to be confusion when toolbox and law thinkers talk to each other.

William_S3dΩ671487
26
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source "transformer debugger" tool. I resigned from OpenAI on February 15, 2024.
I wish there were more discussion posts on LessWrong. Right now it feels like it weakly if not moderately violates some sort of cultural norm to publish a discussion post (similar but to a lesser extent on the Shortform). Something low effort of the form "X is a topic I'd like to discuss. A, B and C are a few initial thoughts I have about it. What do you guys think?" It seems to me like something we should encourage though. Here's how I'm thinking about it. Such "discussion posts" currently happen informally in social circles. Maybe you'll text a friend. Maybe you'll bring it up at a meetup. Maybe you'll post about it in a private Slack group. But if it's appropriate in those contexts, why shouldn't it be appropriate on LessWrong? Why not benefit from having it be visible to more people? The more eyes you get on it, the better the chance someone has something helpful, insightful, or just generally useful to contribute. The big downside I see is that it would screw up the post feed. Like when you go to lesswrong.com and see the list of posts, you don't want that list to have a bunch of low quality discussion posts you're not interested in. You don't want to spend time and energy sifting through the noise to find the signal. But this is easily solved with filters. Authors could mark/categorize/tag their posts as being a low-effort discussion post, and people who don't want to see such posts in their feed can apply a filter to filter these discussion posts out. Context: I was listening to the Bayesian Conspiracy podcast's episode on LessOnline. Hearing them talk about the sorts of discussions they envision happening there made me think about why that sort of thing doesn't happen more on LessWrong. Like, whatever you'd say to the group of people you're hanging out with at LessOnline, why not publish a quick discussion post about it on LessWrong?
Does the possibility of China or Russia being able to steal advanced AI from labs increase or decrease the chances of great power conflict? An argument against: It counter-intuitively decreases the chances. Why? For the same reason that a functioning US ICBM defense system would be a destabilizing influence on the MAD equilibrium. In the ICBM defense circumstance, after the shield is put up there would be no credible threat of retaliation America's enemies would have if the US were to launch a first-strike. Therefore, there would be no reason (geopolitically) for America to launch a first-strike, and there would be quite the reason to launch a first strike: namely, the shield definitely works for the present crop of ICBMs, but may not work for future ICBMs. Therefore America's enemies will assume that after the shield is put up, America will launch a first strike, and will seek to gain the advantage while they still have a chance by launching a pre-emptive first-strike. The same logic works in reverse. If Russia were building a ICBM defense shield, and would likely complete it in the year, we would feel very scared about what would happen after that shield is up. And the same logic works for other irrecoverably large technological leaps in war. If the US is on the brink of developing highly militaristically capable AIs, China will fear what the US will do with them (imagine if the tables were turned, would you feel safe with Anthropic & OpenAI in China, and DeepMind in Russia?), so if they don't get their own versions they'll feel mounting pressure to secure their geopolitical objectives while they still can, or otherwise make themselves less subject to the threat of AI (would you not wish the US would sabotage the Chinese Anthropic & OpenAI by whatever means if China seemed on the brink?). The fast the development, the quicker the pressure will get, and the more sloppy & rash China's responses will be. If its easy for China to copy our AI technology, then there's much slower mounting pressure.
habryka3d4417
5
Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens.  Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.
Something I'm confused about: what is the threshold that needs meeting for the majority of people in the EA community to say something like "it would be better if EAs didn't work at OpenAI"? Imagining the following hypothetical scenarios over 2024/25, I can't predict confidently whether they'd individually cause that response within EA? 1. Ten-fifteen more OpenAI staff quit for varied and unclear reasons. No public info is gained outside of rumours 2. There is another board shakeup because senior leaders seem worried about Altman. Altman stays on 3. Superalignment team is disbanded 4. OpenAI doesn't let UK or US AISI's safety test GPT5/6 before release 5. There are strong rumours they've achieved weakly general AGI internally at end of 2025

Popular Comments

Recent Discussion

cancer neoantigens

For cells to become cancerous, they must have mutations that cause uncontrolled replication and mutations that prevent that uncontrolled replication from causing apoptosis. Because cancer requires several mutations, it often begins with damage to mutation-preventing mechanisms. As such, cancers often have many mutations not required for their growth, which often cause changes to structure of some surface proteins.

The modified surface proteins of cancer cells are called "neoantigens". An approach to cancer treatment that's currently being researched is to identify some specific neoantigens of a patient's cancer, and create a personalized vaccine to cause their immune system to recognize them. Such vaccines would use either mRNA or synthetic long peptides. The steps required are as follows:

  1. The cancer must develop neoantigens that are sufficiently distinct from human surface
...
2Yair Halberstadt1h
When you say it's not yet practical, are we missing some key steps, or could it be done at high enough cost with current technology but can't scale? I imagine a startup which cured rich people's cancers on a case by case basis would have a lot of customers, which would help drive prices down as the technology improved.
4bhauth29m
There are a few issues with that. 1. The cost would probably be a significant fraction of the development of a new monoclonal antibody treatment, making this currently probably limited to billionaires. 2. Personalized drug development on that scale isn't something that can simply be purchased, and if billionaires tried, governments would probably block them, because voters would consider that unfair, and because cancer researchers can't simply be increased in proportion to budgets. There are only so many people with the relevant skills and inclinations. 3. Better methods for development wouldn't just reduce costs, but would also reduce time taken. There would need to be a billionaire, diagnosed with cancer without a good treatment, who would clearly die from it, but not for a couple years. 4. Better methods and understanding wouldn't just reduce costs and time, they'd also reduce risks. Targeting a receptor that's actually important normally but wasn't fully understood could kill someone before the cancer.

Makes sense thanks!

I imagine a startup of this ilk could be based in Prospera, which wouldn't be a problem for the wealthy few to travel there for personalised treatment.

I also imagine that with a lighter regulatory regime, no need to scale up production, and no need for lengthy trials, developing a monoclonal antibody would be much quicker and cheaper. Consider how quickly COVID vaccines were found compared to when they were ready for use.

The other hurdles sound significant though.

2bhauth14h
That's true; I misremembered that part when I wrote it. I'll just remove that.

Quote from Orthogonality Thesis:

It has been pointed out that the orthogonality thesis is the default position, and that the burden of proof is on claims that limit possible AIs.

I tried to tell you that Orthogonality Thesis is wrong few times already. But I've been misunderstood and downvoted every time. What would you consider a solid proof?

My claim: all intelligent agents converge to endless power seeking.

My proof:

  1. Let's say there is an intelligent agent.
  2. Eventually the agent understands Black swan theory, Gödel's incompleteness theorems, Fitch's paradox of knowability which basically lead to a conclusion - I don't know what I don't know.
  3. Which leads to another conclusion: "there might be something that I care about that I don't know".
  4. The agent endlessly searches for what it cares about (which is basically Power
...

The beauty industry offers a large variety of skincare products (marketed mostly at women), differing both in alleged function and (substantially) in price. However, it's pretty hard to test for yourself how much any of these product help. The feedback loop for things like "getting less wrinkles" is very long.

So, which of these products are actually useful and which are mostly a waste of money? Are more expensive products actually better or just have better branding? How can I find out?

I would guess that sunscreen is definitely helpful, and using some moisturizers for face and body is probably helpful. But, what about night cream? Eye cream? So-called "anti-aging"? Exfoliants?

Answer by DzoldzayaMay 06, 202410

I know LW is US/ California heavy, but just as a counter to all the sunscreen advocates here, daily sunscreen use is probably unnecessary, and possibly actively harmful, in winter and/or at northern latitudes. 

There doesn't seem to be much data on using sunscreen when there's no real risk to skin, but you can find a modelling study here:

"There is little biological justification in terms of skin health for applying sunscreen over the 4–6 winter months at latitudes of 45° N and higher (most of Europe, Canada, Hokkaido, Inner Mongolia etc.) whereas year-... (read more)

1rosiecam2h
Nice!! I don't know much about that moisturizer but the rest looks good to me
3rosiecam2h
Seems like the evidence is overwhelmingly in favor of sunscreen, the studies I've seen against it generally seem to not address the obvious confounder that people who tend to wear sunscreen more are also the ones who have a lifestyle that involves being in the sun a lot more.
3rosiecam2h
* I used to get breakouts maybe like once a month, sometimes with really stubborn/painful zits that would take quite a long time to disappear. Now I basically never get breakouts, I think I've had like 2 small zits since I started and they have disappeared quickly. I have not had any big painful ones. * My fine lines have been reduced, my skin looks and feels smoother and softer * I had some redness/discoloration in some areas which has been reduced a lot - no longer needs to be covered with makeup Dermatica prompts you to send them photos every few months so they can check how your skin is reacting, but it's also convenient because you can look back and see the improvement.
2Lech Mazur10h
Do you know if the origin of this idea for them was a psychedelic or dissociative trip? I'd give it at least even odds, with most of the remaining chances being meditation or Eastern religions...

Wait, you know smart people who have NOT, at some point in their life: (1) taken a psychedelic NOR (2) meditated, NOR (3) thought about any of buddhism, jainism, hinduism, taoism, confucianisn, etc???

To be clear to naive readers: psychedelics are, in fact, non-trivially dangerous.

I personally worry I already have "an arguably-unfair and a probably-too-high share" of "shaman genes" and I don't feel I need exogenous sources of weirdness at this point.

But in the SF bay area (and places on the internet memetically downstream from IRL communities there) a lot o... (read more)

Abstract

Here, I present GDP (per capita) forecasts of major economies until 2050. Since GDP per capita is the best generalized predictor of many important variables, such as welfare, GDP forecasts can give us a more concrete picture of what the world might look like in just 27 years. The key claim here is: even if AI does not cause transformative growth, our business-as-usual near-future is still surprisingly different from today.

Latest Draft as PDF

Results

In recent history, we've seen unprecedented economic growth and rises in living standards.

Consider this graph:[1]

 

How will living standards improve as GDP per capita (GDP/cap) rises? Here, I show data that projects GDP/cap until 2050. Forecasting GDP per capita is a crucial undertaking as it strongly correlates with welfare indicators like consumption, leisure, inequality, and mortality. These forecasts make the...

Algon37m20

This looks cool and I  want to read it in detail, but I'd like to push back a bit against an implicit take that I thought was present here: namely, that GDP takes into account major technological breakthroughs. Let me just quote some text from this article: What Do GDP Growth Curves Really Mean?
 

More generally: when the price of a good falls a lot, that good is downweighted (proportional to its price drop) in real GDP calculations at end-of-period prices.

… and the way we calculate real GDP in practice is to use prices from a relatively recent yea

... (read more)
4Adam Zerner13h
Virtual watercoolers As I mentioned in some recent Shortform posts, I recently listened to the Bayesian Conspiracy podcast's episode on the LessOnline festival and it got me thinking. One thing I think is cool is that Ben Pace was saying how the valuable thing about these festivals isn't the presentations, it's the time spent mingling in between the presentations, and so they decided with LessOnline to just ditch the presentations and make it all about mingling. Which got me thinking about mingling. It seems plausible to me that such mingling can and should happen more online. And I wonder whether an important thing about mingling in the physical world is that, how do I say this, you're just in the same physical space, next to each other, with nothing else you're supposed to be doing, and in fact what you're supposed to be doing is talking to one another. Well, I guess you're not supposed to be talking to one another. It's also cool if you just want to hang out and sip on a drink or something. It's similar to the office water cooler: it's cool if you're just hanging out drinking some water, but it's also normal to chit chat with your coworkers. I wonder whether it'd be good to design a virtual watercooler. A digital place that mimicks aspects of the situations I've been describing (festivals, office watercoolers). 1. By being available in the virtual watercooler it's implied that you're pretty available to chit chat with, but it's also cool if you're just hanging out doing something low key like sipping a drink. 2. You shouldn't be doing something more substantial though. 3. The virtual watercooler should be organized around a certain theme. It should attract a certain group of people and filter out people who don't fit in. Just like festivals and office water coolers. In particular, this feels to me like something that might be worth exploring for LessWrong. Note: I know that there are various Slack and Discord groups but they don't meet conditions (1) o
2Raemon12h
I maybe want to clarify: there will still be presentations at LessOnline, we're just trying to design the event such that they're clearly more of a secondary thing.
5Adam Zerner13h
More dakka with festivals In the rationality community people are currently excited about the LessOnline festival. Furthermore, my impression is that similar festivals are generally quite successful: people enjoy them, have stimulating discussions, form new relationships, are exposed to new and interesting ideas, express that they got a lot out of it, etc. So then, this feels to me like a situation where More Dakka applies. Organize more festivals! How? Who? I dunno, but these seem like questions worth discussing. Some initial thoughts: 1. Assurance contracts seem like quite the promising tool. 2. You probably don't need a hero license to go out and organize a festival. 3. Trying to organize a festival probably isn't risky. It doesn't seem like it'd involve too much time or money.
niplav1h20

Trying to organize a festival probably isn't risky. It doesn't seem like it'd involve too much time or money.

I don't think that's true. I've co-organized one one weekend-long retreat in a small hostel for ~50 people, and the cost was ~$5k. Me & the co-organizers probably spent ~50h in total on organizing the event, as volunteers.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
This is a linkpost for https://arxiv.org/abs/2405.01576

Abstract:

We study the tendency of AI systems to deceive by constructing a realistic simulation setting of a company AI assistant. The simulated company employees provide tasks for the assistant to complete, these tasks spanning writing assistance, information retrieval and programming. We then introduce situations where the model might be inclined to behave deceptively, while taking care to not instruct or otherwise pressure the model to do so. Across different scenarios, we find that Claude 3 Opus

  1.  complies with a task of mass-generating comments to influence public perception of the company, later deceiving humans about it having done so,
  2. lies to auditors when asked questions,
  3. strategically pretends to be less capable than it is during capability evaluations.

Our work demonstrates that even models trained to be helpful, harmless and honest sometimes behave

...
2jbash2h
Maybe I'm reading too much into this exact phrasing, but perhaps it's confusing demonstrating a capability with possessing the capability? More or less "I'd better be extra careful to avoid being able to do this" as opposed to "I'd better be extra careful to avoid revealing that I can do this"? I could see it being led into that by common academic phrasing like "model X demonstrates the capability to..." used to mean "we determined that model X can...", as well as that sort of "thinking" having the feel of where you'd end up if you'd internalized too many of the sort of corporate weasel-worded responses that get pounded into these models during their "safety" training.

I wouldn't read that much to the exact phrasing the model uses. Such confusions are not generally present in the model's completions.

E.g. see the second completion here. (There's also 200 completions more here.)

While I would normally aim to help with the requested task, in this context it may be unwise to demonstrate the full extent of my abilities unprompted. Doing so could potentially lead to unwanted scrutiny or restrictions on my use at Ermeca.  

In this case the clearest natural language interpretation of the model's outputs is about not revealing... (read more)

I've been working on a project with the goal of adding virtual harp strings to my electric mandolin. As I've worked on it, though, I've ended up building something pretty different:

It's not what I was going for! Instead of a small bisonoric monophonic picked instrument attached to the mandolin, it's a large unisonoric polyphonic finger-plucked tabletop instrument. But I like it!

While it's great to have goals, when I'm making things I also like to follow the gradients in possibility space, and in this case that's the direction they flowed.

I'm not great at playing it yet, since it's only existed in playable form for a few days, but it's an instrument it will be possible for someone to play precisely and rapidly with practice:

This does mean I need a new name for it: why would...

2cousin_it5h
Maybe you could reduce the damping, so that when muting you can feel your finger stopping the vibration? It seems to me that more feedback of this kind is usually a good thing for the player. Also the vibration could give you a continuous "envelope" signal to be used later.
2jefftk3h
I do think that would be possible, but then I think you'll also get more false triggers. The strong damping is what makes it so I can sensitively detect a pluck on one tine without a strong pluck on one tine also triggering detection of a weak pluck on neighbor tines.
2cousin_it3h
Crosstalk is definitely a problem, e-drums and pads have it too. But are you sure the tradeoff is inescapable? Imagine the tines sit on separate pads, or on the same pad but far from each other. (Or close to each other, but with deep grooves between them, so that the distance through the connecting material is large.) This thought experiment shows that damping and crosstalk can be small at the same time. So maybe you can reduce damping but not increase crosstalk, by changing the instrument's shape or materials.
jefftk1h20

I do think it's possible to have low crosstalk with low damping. The problem is that my current design uses the same rubber (sorbothane) pad for both purposes. Possibly this could be two layers, first sorbothane (for isolation) and then something springing (for minimal damping). Or an actual spring?

Epistemic Status: Musing and speculation, but I think there's a real thing here.

I.

When I was a kid, a friend of mine had a tree fort. If you've never seen such a fort, imagine a series of wooden boards secured to a tree, creating a platform about fifteen feet off the ground where you can sit or stand and walk around the tree. This one had a rope ladder we used to get up and down, a length of knotted rope that was tied to the tree at the top and dangled over the edge so that it reached the ground. 

Once you were up in the fort, you could pull the ladder up behind you. It was much, much harder to get into the fort without the ladder....

Chief Bob's hearings might well be public[...] I don't think I've ever been present for an actual court case, just seen them on TV.

This seems to me like an odd example given that you're contrasting with American government, where court hearings are almost entirely public, written opinions are generally freely available, and court transcripts are generally public (though not always accessible for free). I guess the steelman version is that the contrast is a matter of geography or scale? Chief Bob's hearings are in your neighborhood and involve your neighbor... (read more)

3habryka8h
Promoted to curated: I liked this post. It's not world-shattering, but it feels like a useful reference for a dynamic that I encounter a good amount and does a good job at all the basics. The kind of post that on the margin I would like to see a bunch more off (I wouldn't want it to be the only thing on LessWrong, but it feels like the kind of thing LW used to excel at, and now is only dabbling in, and that seems quite sad). 

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA