Kajus — LessWrong

I briefly read 6pack.care website and your post. It sounds to me like an idea supplementary to existing AI safety paradigms and not solving the core problem - aligning AIs. Looking at your website I see that it's already assumed that AI is mostly aligned and issues with rogue AIs are not mentioned in the risks section

A midsize city is hit by floods. The city launches a simple chatbot to help people apply for emergency cash. Here is what attentiveness looks like in action:
Listening. People send voice notes, texts, or visit a kiosk. Messages stay in the original language, with a clear translation beside them. Each entry records where it came from and when.
Mapping. The team (and the bot) sort the needs into categories: housing, wage loss, and medical care. They keep disagreements visible — renters and homeowners need different proofs.
Receipts. Every contributor gets a link to see how their words were used and a button to say “that’s not what I meant.

and so on.

Kajus's Shortform

Kajus17d30

I applied for Thomas Kwa SPAR stream but I have some doubts about the direction of research. I post it here to get feedback on my thoughts. Kwa wants to train models to produce something close to neuralese as reasoning traces and evaluate white box and black box monitoring against these traces. It seems obvious to me that when a model switches to neuralese we already know that something is wrong, so why test our monitors against neuralese?

Rauno's Shortform

Kajus25d10

Wikipedia?

Tomás B.'s Shortform

Kajus1mo74

source on the ED risk?

Daniel Kokotajlo's Shortform

Kajus2mo20

do you want to stop worrying?

Raemon's Shortform

Kajus2mo83

I think that on most of the websites only about 1-10% of the users actually post things. I suspect that the number of people having those weird interactions with LLMs (and stopping before posting stuff) is like 10 - 10000 (most likely around 100) times bigger than what we see here

tailcalled's Shortform

Kajus3mo10

why?

Kajus's Shortform

Kajus3mo30

The goals we set for AIs in training are proxy goals. We, humans, also set proxy goals. We use KPIs, we talk about solving alignment and ending malaria (proxy to increasing utility, saving lives) budgets and so on. We can somehow focus on proxy goals and maintain that we have some higher level goal at the same time. How is this possible? How can we teach AI to do that?

We’re in Deep Research

Kajus3mo10

So I got what I wanted. I tried zed code editor and well... it's free and very agentic. I haven't tried cursor but I think it might be on the same level.

Open Thread Winter 2024/2025

Kajus3mo10

I don't think that anymore. I think it's possible to get labs to use your work (e.g. you devised a new eval or new mech interp technique which solves some important problems) but it has to be good enough and you need to find a way to communicate it. I changed my mind after EAG London

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments