LESSWRONG
LW

2209
Kajus
1192780
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
1Kajus's Shortform
2y
47
⿻ Plurality & 6pack.care
Kajus13d123

I briefly read 6pack.care website and your post. It sounds to me like an idea supplementary to existing AI safety paradigms and not solving the core problem - aligning AIs. Looking at your website I see that it's already assumed that AI is mostly aligned and issues with rogue AIs are not mentioned in the risks section

A midsize city is hit by floods. The city launches a simple chatbot to help people apply for emergency cash. Here is what attentiveness looks like in action:

  • Listening. People send voice notes, texts, or visit a kiosk. Messages stay in the original language, with a clear translation beside them. Each entry records where it came from and when.
  • Mapping. The team (and the bot) sort the needs into categories: housing, wage loss, and medical care. They keep disagreements visible — renters and homeowners need different proofs.
  • Receipts. Every contributor gets a link to see how their words were used and a button to say “that’s not what I meant.

and so on. 

Reply
Kajus's Shortform
Kajus17d30

I applied for Thomas Kwa SPAR stream but I have some doubts about the direction of research. I post it here to get feedback on my thoughts. Kwa wants to train models to produce something close to neuralese as reasoning traces and evaluate white box and black box monitoring against these traces. It seems obvious to me that when a model switches to neuralese we already know that something is wrong, so why test our monitors against neuralese?

Reply
Rauno's Shortform
Kajus25d10

Wikipedia?

Reply
Tomás B.'s Shortform
Kajus1mo74

source on the ED risk?

Reply
Daniel Kokotajlo's Shortform
Kajus2mo20

do you want to stop worrying?

Reply
Raemon's Shortform
Kajus2mo83

I think that on most of the websites only about 1-10% of the users actually post things. I suspect that the number of people having those weird interactions with LLMs (and stopping before posting stuff) is like 10 - 10000 (most likely around 100) times bigger than what we see here

Reply
tailcalled's Shortform
Kajus3mo10

why?

Reply
Kajus's Shortform
Kajus3mo30

The goals we set for AIs in training are proxy goals. We, humans, also set proxy goals. We use KPIs, we talk about solving alignment and ending malaria (proxy to increasing utility, saving lives) budgets and so on. We can somehow focus on proxy goals and maintain that we have some higher level goal at the same time. How is this possible? How can we teach AI to do that? 

Reply
We’re in Deep Research
Kajus3mo10

So I got what I wanted. I tried zed code editor and well... it's free and very agentic. I haven't tried cursor but I think it might be on the same level.

Reply
Open Thread Winter 2024/2025
Kajus3mo10

I don't think that anymore. I think it's possible to get labs to use your work (e.g. you devised a new eval or new mech interp technique which solves some important problems) but it has to be good enough and you need to find a way to communicate it. I changed my mind after EAG London

Reply
Load More
7What is the theory of change behind writing papers about AI safety?
Q
6mo
Q
1
1Kajus's Shortform
2y
47