Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
A list of some contrarian takes I have: * People are currently predictably too worried about misuse risks * What people really mean by "open source" vs "closed source" labs is actually "responsible" vs "irresponsible" labs, which is not affected by regulations targeting open source model deployment. * Neuroscience as an outer alignment[1] strategy is embarrassingly underrated. * Better information security at labs is not clearly a good thing, and if we're worried about great power conflict, probably a bad thing. * Much research on deception (Anthropic's recent work, trojans, jailbreaks, etc) is not targeting "real" instrumentally convergent deception reasoning, but learned heuristics. Not bad in itself, but IMO this places heavy asterisks on the results they can get. * ML robustness research (like FAR Labs' Go stuff) does not help with alignment, and helps moderately for capabilities. * The field of ML is a bad field to take epistemic lessons from. Note I don't talk about the results from ML. * ARC's MAD seems doomed to fail. * People in alignment put too much faith in the general factor g. It exists, and is powerful, but is not all-consuming or all-predicting. People are often very smart, but lack social skills, or agency, or strategic awareness, etc. And vice-versa. They can also be very smart in a particular area, but dumb in other areas. This is relevant for hiring & deference, but less for object-level alignment. * People are too swayed by rhetoric in general, and alignment, rationality, & EA too, but in different ways, and admittedly to a lesser extent than the general population. People should fight against this more than they seem to (which is not really at all, except for the most overt of cases). For example, I see nobody saying they don't change their minds on account of Scott Alexander because he's too powerful a rhetorician. Ditto for Eliezer, since he is also a great rhetorician. In contrast, Robin Hanson is a famously terrible rhetorician, so people should listen to him more. * There is a technocratic tendency in strategic thinking around alignment (I think partially inherited from OpenPhil, but also smart people are likely just more likely to think this way) which biases people towards more simple & brittle top-down models without recognizing how brittle those models are. ---------------------------------------- 1. A non-exact term ↩︎
Quote from Cal Newport's Slow Productivity book: "Progress in theoretical computer science research is often a game of mental chicken, where the person who is able to hold out longer through the mental discomfort of working through a proof element in their mind will end up with the sharper result."
Thomas KwaΩ472
0
I started a dialogue with @Alex_Altair a few months ago about the tractability of certain agent foundations problems, especially the agent-like structure problem. I saw it as insufficiently well-defined to make progress on anytime soon. I thought the lack of similar results in easy settings, the fuzziness of the "agent"/"robustly optimizes" concept, and the difficulty of proving things about a program's internals given its behavior all pointed against working on this. But it turned out that we maybe didn't disagree on tractability much, it's just that Alex had somewhat different research taste, plus thought fundamental problems in agent foundations must be figured out to make it to a good future, and therefore working on fairly intractable problems can still be necessary. This seemed pretty out of scope and so I likely won't publish. Now that this post is out, I feel like I should at least make this known. I don't regret attempting the dialogue, I just wish we had something more interesting to disagree about.
We recently released an interview with independent scholar John Wentworth: It mostly centers around two themes: "abstraction" (forming concepts) and "agency" (dealing with goal-directed systems).  Check it out!
RobertM5437
8
EDIT: I believe I've found the "plan" that Politico (and other news sources) managed to fail to link to, maybe because it doesn't seem to contain any affirmative commitments by the named companies to submit future models to pre-deployment testing by UK AISI. I've seen a lot of takes (on Twitter) recently suggesting that OpenAI and Anthropic (and maybe some other companies) violated commitments they made to the UK's AISI about granting them access for e.g. predeployment testing of frontier models.  Is there any concrete evidence about what commitment was made, if any?  The only thing I've seen so far is a pretty ambiguous statement by Rishi Sunak, who might have had some incentive to claim more success than was warranted at the time.  If people are going to breathe down the necks of AGI labs about keeping to their commitments, they should be careful to only do it for commitments they've actually made, lest they weaken the relevant incentives.  (This is not meant to endorse AGI labs behaving in ways which cause strategic ambiguity about what commitments they've made; that is also bad.)

Popular Comments

Recent Discussion

Predicting the future is hard, so it’s no surprise that we occasionally miss important developments.

However, several times recently, in the contexts of Covid forecasting and AI progress, I noticed that I missed some crucial feature of a development I was interested in getting right, and it felt to me like I could’ve seen it coming if only I had tried a little harder. (Some others probably did better, but I could imagine that I wasn't the only one who got things wrong.)

Maybe this is hindsight bias, but if there’s something to it, I want to distill the nature of the mistake.

First, here are the examples that prompted me to take notice:

Predicting the course of the Covid pandemic:

  • I didn’t foresee the contribution from sociological factors (e.g., “people not wanting
...

Here's something that I suspect a lot of people are skeptical of right now but that I expect will become increasingly apparent over time (with >50% credence): slightly smarter-than-human software AIs will initially be safe and highly controllable by virtue of not having a physical body and not having any social and/or legal rights.

In other words, "we will be able to unplug the first slightly smarter-than-human-AIs if they go rogue", and this will actually be a strategically relevant fact, because it implies that we'll be able to run extensive experiment... (read more)

4Seth Herd
Forecasting is hard. Forecasting in a domain that includes human psychology, society-level propagation of beliefs, development of entirely new technology, and understanding how a variety of minds work in enough detail to predict not only what they'll do but how they'll change - that's really hard. So, should we give up, and just prepare for any scenario? I don't think so. I think we should try harder. That involves spending more individual time on it, and doing more collaborative prediction with people of different perspectives and different areas of expertise. On the object level: I think it's pretty easy to predict now that we'll have more ChatGPT moments, and the Overton window will shift farther. In particular, I think interacting with a somewhat competent agent with self-awareness will be an emotionally resonant experience for most people who haven't previously imagined in detail that such a thing might exist soon.
7Stefan_Schubert
Thanks for this thoughtful article. It seems to me that the first and the second examples have something in common, namely an underestimate of the degree to which people will react to perceived dangers. I think this is fairly common in speculations about potential future disasters, and have called it sleepwalk bias. It seems like something that one should be able to correct for. I think there is an element of sleepwalk bias in the AI risk debate. See this post where I criticise a particular vignette.
4faul_sname
I think one missing dynamic is "tools that an AI builds won't only be used by the AI that built them" and so looking at what an AI from 5 years in the future would do with tools from 5 years in the future if it was dropped into the world of today might not give a very accurate picture of what the world will look like in 5 years.

It was a remarkably quiet announcement. We now have Alpha Fold 3, it does a much improved job predicting all of life’s molecules and their interactions. It feels like everyone including me then shrugged and went back to thinking about other things. No cool new toy for most of us to personally play with, no existential risk impact, no big trades to make, ho hum.

But yes, when we look back at this week, I expect what we remember will be Alpha Fold 3.

Unless it turns out that it is Sophon, a Chinese technique to potentially make it harder to fine tune an open model in ways the developer wants to prevent. I do not expect this to get the job done that needs doing, but...

mishka10

This also points out that Arena tells you what model is Model A and what is Model B. That is unfortunate, and potentially taints the statistics.

No, https://chat.lmsys.org/ says this:

  • Ask any question to two anonymous models (e.g., ChatGPT, Claude, Llama) and vote for the better one!
  • You can chat for multiple turns until you identify a winner.
  • Votes won't be counted if model identities are revealed during the conversation.

So one can choose to know the names of the models one is talking with, but then one's votes will not be counted for the statistics.

1Askwho
I've produced a muti-voiced ElevenLabs quality AI narration episode for this post: https://askwhocastsai.substack.com/p/ai-63-introducing-alpha-fold-3-by

Somerville historically had a zoning ordinance limiting housing units to at most four unrelated people:

any number of persons related by blood, marriage, adoption, or foster care agreement and up to three (3) additional unrelated persons living together as a single housekeeping unit

This is something I'd been unhappy about for years, and was enthusiastic about the "4 unrelated is outdated" campaign to change it in in 2018. So I'm very happy that after a request for a variance the city council instead ended up removing the restriction.

The actual change was in November, so I'm a bit late on this!

I also think there was an oversight, where the removal didn't include changing the text in section 7-153 which says "All schools shall be responsible for publicizing to their students the limitations of the city's zoning ordinance which limits occupancy to not...

Nice; Colorado recently passed a statewide law that finally does away with a similar "U+2" rule in my own town of Fort Collins (as well as other such rules in Boulder and elsewhere). To progress!

4AnthonyC
Nicely done, it really is an absurd rule. When I graduated and moved in with my then significant other and two roommates in Medford, back in 2009, this is why I had to be kept off the lease. (As a side effect, this made me ineligible for a resident parking permit for the first few months, but the town had no problem giving me tickets for using a non-resident parking permit as a resident).

The curious tale of how I mistook my dyslexia for stupidity - and talked, sang, and drew my way out of it. 

Sometimes I tell people I’m dyslexic and they don’t believe me. I love to read, I can mostly write without error, and I’m fluent in more than one language.

Also, I don’t actually technically know if I’m dyslectic cause I was never diagnosed. Instead I thought I was pretty dumb but if I worked really hard no one would notice. Later I felt inordinately angry about why anyone could possibly care about the exact order of letters when the gist is perfectly clear even if if if I right liike tis.

I mean, clear to me anyway.

I was 25 before it dawned on me that all the tricks...

This was really interesting! You probably already know this, but reading out loud was the norm, and silent reading unusual, for most of history: https://en.wikipedia.org/wiki/Silent_reading That didn't really start to change until well after the invention of the printing press.

For most of my life, even now once in a while, I would subvocalize my own inner monologue. Definitely had to learn to suppress that in social situations.

4gwern
Classic 'typical mind' like experience: https://www.lesswrong.com/posts/baTWMegR42PAsH9qJ/generalizing-from-one-example
1Lorxus
  This is totally true. I am a professional mathematician, and I also have a strong "mental voice". Whenever I read mathematical texts/research papers with equations inline, I totally read the equations aloud in my head. It makes me wonder to what extent being dyslexic for English (or other written natural languages) fails to co-occur with being dyslexic for math-tongue (as distinct from dyscalculia, with AIUI has to do mostly with disability at mental calculation and mental manipulation of quantitative facts). Also, I can read Korean and have had the distinct sensation of it being harder to make myself care about the differences between the characters, very early on; similarly, when practicing Chinese characters in class, I've seen a lot of classmates have a very hard time because they have to suddenly resort to having to treat the characters like they're pictures without even having the mental technology of how to do that correctly, so I wonder how much of dyslexia transfers cross-linguistically! Are there people who can read Cyrillic and Greek, but not Latin script or Hebrew? Who knows!
3Shoshannah Tekofsky
aaaaw thank you for saying that! _ I appreciate it!
This is a linkpost for http://Less.Online/

A Festival of Writers Who are Wrong on the Internet[1]

LessOnline is a festival celebrating truth-seeking, optimization, and blogging. It's an opportunity to meet people you've only ever known by their LessWrong username or Substack handle.

We're running a rationalist conference!

The ticket cost is $400 minus your LW karma in cents.

Confirmed attendees include Scott Alexander, Zvi Mowshowitz, Eliezer Yudkowsky, Katja Grace, and Alexander Wales.

Less.Online

Go through to Less.Online to learn about who's attending, venue, location, housing, relation to Manifest, and more.

We'll post more updates about this event over the coming weeks as it all comes together.

If LessOnline is an awesome rationalist event,
I desire to believe that LessOnline is an awesome rationalist event;

If LessOnline is not an awesome rationalist event,
I desire to believe that LessOnline is not an awesome rationalist event;

Let me not become attached to beliefs I may not want.

      —Litany of Rationalist Event Organizing

  1. ^

    But Striving to be Less So

4Error
I see no general-inquiries address on less.online, so I hope it's okay if I post them here. Longtime rationalsphere lurker, much rarer poster, considering going. I'm based in Atlanta and pricing the trip: 1. It's not clear from the page what the 'summer camp' part of the schedule is. To me that phrase connotes 'for kids', but I'm not sure, so I'm not sure if it's worth it to me to stay extra days. [EDIT: Never mind, I see this was answered below.] 2. What's the nearest major airport, and is Lighthaven accessible from there by train and/or uber? (i.e. will I need a rental car) 3. What's the washroom situation like for the shared dorms vs. private rooms? (I take long showers and worry about blocking others.) 4. Most of the shared dorms don't list a building, just "shared dorms". Where are they relative to the rest of the facility? 5. What's local connectivity like? (I can stay longer if I can plausibly respond to work emergencies). [EDIT: And, I just realized this announcement was over a month ago, I'm not sure how I missed it at the time or why I just noticed it now. Dunno if you or anyone else will see this. I'll wait a day in case someone does, then try a PM or something.]

Thanks for the questions, and I hope you make it! Here's are my answers, happy to answer more/follow-ups.

  1. Summer Camp is our name for the chill period between the two festival weekends. We've already sold ~100 tickets just for Summer Camp, so it's gonna be active.
  2. SFO and OAK. You can get the Bart from SFO and then walk the last 20 mins. Google maps says that there's also public transport from Oakland (which to be clear is a fair bit closer than SFO). I haven't gone that route before, I think it might involve buses.
  3. There's a few places people will be sleepin
... (read more)
3Ben Pace
I did find it and we sent him an email, hope he reads it and joins :)

Quote from Cal Newport's Slow Productivity book: "Progress in theoretical computer science research is often a game of mental chicken, where the person who is able to hold out longer through the mental discomfort of working through a proof element in their mind will end up with the sharper result."

2Seth Herd
I think future more powerful/useful AIs will understand our intentions better IF they are trained to predict language. Text corpuses contain rich semantics about human intentions. I can imagine other AI systems that are trained differently, and I would be more worried about those. That's what I meant by current AI understanding our intentions possibly better than future AI.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

2nim
I notice that I am confused: an image of lily pads appears on https://www.lesswrong.com/s/XJBaPPEYAPeDzuAsy when I load it, but when I expand all community sequences on https://www.lesswrong.com/library (a show-all button might be nice....) and search the string "physical" or "necessity" on that page, I do not see the post appearing. This seems odd, because I'd expect that having a non-default image display when the sequence's homepage is loaded and having a good enough image to appear in the list should be the same condition, but it seems they aren't identical for that one.

There are two images provided for a sequence, the banner image and the card image. The card image is required for it to show up in the Library. 

2nim
I am delighted that you chimed in here; these are pleasingly composed and increase my desire to read the relevant sequences. Your post makes me feel like I meaningfully contributed to the improvement of these sequences by merely asking a potentially dumb question in public, which is the internet at its very best. Artistically, I think the top (fox face) image for lotteries cropped for its bottom 2/3 would be slightly preferable to the other, and the bottom (monochrome white/blue) for geometric makes a nicer banner in the aspect ratio that they're shown as.
2RobertM
EDIT: looks like habryka got there earlier and I didn't see it. https://www.lesswrong.com/posts/zXJfH7oZ62Xojnrqs/#sLay9Tv65zeXaQzR4 Intercom is indeed hidden on mobile (since it'd be pretty intrusive at that screen size).

1. If you find that you’re reluctant to permanently give up on to-do list items, “deprioritize” them instead

hate the idea of deciding that something on my to-do list isn’t that important, and then deleting it off my to-do list without actually doing it. Because once it’s off my to-do list, then quite possibly I’ll never think about it again. And what if it’s actually worth doing? Or what if my priorities will change such that it will be worth doing at some point in the future? Gahh!

On the other hand, if I never delete anything off my to-do list, it will grow to infinity.

The solution I’ve settled on is a priority-categorized to-do list, using a kanban-style online tool (e.g. Trello). The left couple columns (“lists”) are very active—i.e., to-do list...

2MondSemmel
I've found that there's value in having short to-do lists, because short lists fit much better into working memory and are thus easier to think about. If items are deprioritized rather than getting properly deleted from the system, this increases the total number of to-dos one could think about. On the other hand, maybe moving tasks to offscreen columns is sufficient to get them off one's mind? It seems to me like a both easier and more comprehensive approach would be to use a text editor with proper version control and diff features, and then to name particular versions before making major changes.

short lists fit much better into working memory

IMO the main point of a to-do list is to not have the to-do list in working memory. The only thing that should be in working memory is the one thing you're actually supposed to be focusing on and doing, right now. Right?

Or if you're instead in the mode of deciding what to do next, or making a schedule for your day, etc., then that's different, but working memory is still kinda irrelevant because presumably you have your to-do list open on your computer, right in front of your eyes, while you do that, right?... (read more)

Here's something I've been pondering.

hypothesis: If transformers has internal concepts, and they are represented in the residual stream. Then because we have access to 100% of the information then it should be possible for a non-linear probe to get 100% out of distribution accuracy. 100% is important because we care about how a thing like value learning will generalise OOD.

And yet we don't get 100% (in fact most metrics are much easier than what we care about, being in-distribution, or on careful setups). What is wrong with the assumptions hypothesis, do you think?

1wassname
new observations > new thoughts when it comes to calibrating yourself. The best calibrated people are people who get lots of interaction with the real world, not those who think a lot or have a complicated inner model. Tetlock's super forecasters were gamblers and weathermen.

LessOnline & Manifest Summer Camp

June 3rd to June 7th

Between LessOnline and Manifest, stay for a week of experimental events, chill coworking, and cozy late night conversations.

Prices raise $100 on May 13th