I briefly read 6pack.care website and your post. It sounds to me like an idea supplementary to existing AI safety paradigms and not solving the core problem - aligning AIs. Looking at your website I see that it's already assumed that AI is mostly aligned and issues with rogue AIs are not mentioned in the risks section
A midsize city is hit by floods. The city launches a simple chatbot to help people apply for emergency cash. Here is what attentiveness looks like in action:
- Listening. People send voice notes, texts, or visit a kiosk. Messages stay in the original language, with a clear translation beside them. Each entry records where it came from and when.
- Mapping. The team (and the bot) sort the needs into categories: housing, wage loss, and medical care. They keep disagreements visible — renters and homeowners need different proofs.
- Receipts. Every contributor gets a link to see how their words were used and a button to say “that’s not what I meant.
and so on.
I applied for Thomas Kwa SPAR stream but I have some doubts about the direction of research. I post it here to get feedback on my thoughts. Kwa wants to train models to produce something close to neuralese as reasoning traces and evaluate white box and black box monitoring against these traces. It seems obvious to me that when a model switches to neuralese we already know that something is wrong, so why test our monitors against neuralese?
Wikipedia?
source on the ED risk?
do you want to stop worrying?
I think that on most of the websites only about 1-10% of the users actually post things. I suspect that the number of people having those weird interactions with LLMs (and stopping before posting stuff) is like 10 - 10000 (most likely around 100) times bigger than what we see here
why?
The goals we set for AIs in training are proxy goals. We, humans, also set proxy goals. We use KPIs, we talk about solving alignment and ending malaria (proxy to increasing utility, saving lives) budgets and so on. We can somehow focus on proxy goals and maintain that we have some higher level goal at the same time. How is this possible? How can we teach AI to do that?
So I got what I wanted. I tried zed code editor and well... it's free and very agentic. I haven't tried cursor but I think it might be on the same level.
I am creating a comparative analysis of cross-posted posts on LW and EAF. Make your bets!
I will pull all the posts that were posted on both LW and EAF and compare how different topics get different amount of karma and comments (and maybe sentiment of comments) as a proxy for how interested people are and how much they agree with different claims. Make your bests and see if it tells you anything new! I suspect that LW users are much more interested in AI safety and less vegan. They care less about animals and are more skeptical towards utility maximization. Career related posts will fare better on EAF, while rationality (as in rationality as an art) will go better on LW. Productivity posts will have more engagement on EAF.
It is not possible to check all the bets since the number of cross-posted posts is not that big and they are limited to specific topics.