Links #1: 2026/05 Part 1

papetoast

Preface

^means articles I read in full, otherwise assume I skimmed it
I show my discovery graph in (via …) blocks, those without (via …) usually come from my RSS reader, or the algorithm in the corresponding website
This is approximately a 1 in 20 filter of content
This is very disorganized, but hopefully still useful.
- Sometimes quotes are not in quote blocks, but should be obvious in context.
- Links in quotes are sometimes removed.
- I will probably pull engineering related links out into its own section next time

How I would use my linkpost

Sometimes, the only thing worth reading is the title! Read it and move on.
For HackerNews entries, if you choose to read the article, also ask an LLM for things that are worth reading in the comments
Beware systematic selection biases:
- I mostly don't read AI policy stuff
- Very engineering centered

Everything Else

https://lobste.rs/s/ifcyr1/contributor_poker_zig_s_ai_ban#c_cbtxub (via https://simonwillison.net/2026/Apr/30/andrew-kelley/)

It's a common misconception that we can't tell who is using LLM and who is not. I'm sure we didn't catch 100% of LLM-assisted PRs over the past few months, but the kind of mistakes humans make are fundamentally different than LLM hallucinations, making them easy to spot. Furthermore, people who come from the world of agentic coding have a certain digital smell that is not obvious to them but is obvious to those who abstain. It's like when a smoker walks into the room, everybody who doesn't smoke instantly knows it.
I'm not telling you not to smoke, but I am telling you not to smoke in my house.

^https://blog.kevinzwu.com/chinese-cursed-logographic-dags/ (via https://slimemoldtimemold.com/2026/04/30/links-for-april-2026/)

^https://blog.sus.cat/p/not-being-able-to-speak-faq (via https://slimemoldtimemold.com/2026/04/30/links-for-april-2026/)

Apple credits ‘most popular’ ever iPhone for booming sales https://www.ft.com/content/9215954c-47d0-4a59-8ffe-bf49f8ef9a97
Apple CEO Tim Cook said “extraordinary demand” for the iPhone 17 family powered the company’s strongest ever financial results for this period of the year.

Glyphosate https://www.nytimes.com/2026/01/02/climate/glyphosate-roundup-retracted-study.html (via https://statmodeling.stat.columbia.edu/2026/04/30/53020/)

A Study Is Retracted, Renewing Concerns About the Weedkiller Roundup
Problems with a 25-year-old landmark paper on the safety of Roundup’s active ingredient, glyphosate, have led to calls for the E.P.A. to reassess the widely used chemical.
In 2000, a landmark study claimed to set the record straight on glyphosate, a contentious weedkiller used on hundreds of millions of acres of farmland. The paper found that the chemical, the active ingredient in Roundup, wasn’t a human health risk despite evidence of a cancer link.

https://brooker.co.za/blog/2026/04/30/be-right.html

I want to highlight some of the work we’re doing at AWS on agent correctness. This is just a sample of a large body of work, but shows the direction we’re heading it.
Correct-by-construction coding tools and languages, like Hydro for distributed systems, and Cedar for auth. These are tools that agents can use to avoid entire classes of high-consequence defects3.
Spec-driven development in agents like Kiro, which gives the coding agent additional big picture context that helps it evolve systems over time without regressing on key properties. Property-based testing is another example of the same pattern at a different scale.
Code reasoning tools like Strata, powered by Lean, that allow agents to reason formally about properties of code.
Autoformalization, turning natural language into precise formal implementations, in Bedrock AR Checks and AgentCore Policy, which remove whole classes of runtime defects (especially in critical places like tool call safety).
Deterministic and precise policy for tools, in Trusted Remote Execution and AgentCore Policy, which precisely constrain agents tool call behavior.
Principled approaches to deterministic agent steering, like Strands Steering, which can keep agents on the right path while still taking advantage of their power and flexibility.

Elon Musk’s 7 biggest stumbles on the stand at OpenAI trial
Most notable,

OpenAI’s lawyer managed to get him to make several concessions over his own lawyer’s objections.
He also lost a fight to keep xAI’s safety record off the table, calling his reputation as a supposed AI savior defending OpenAI’s mission into question.
He repeatedly appeared dishonest, as OpenAI’s lawyer showed documents contradicting his testimony.
He appeared disingenuous when confronted with calling OpenAI’s safety team “jackasses.”
He appeared disingenuous again when admitting that he didn’t know what “safety cards” are, even though his own AI firm issues them.
Perhaps most embarrassing, he testified that he never loses his temper before raising his voice at OpenAI’s lawyer.
Finally, his lawyers failed to keep his ties to Donald Trump off the record, with the judge agreeing to hear discussions that might further discredit Musk’s testimony.

Libghostty can now be used to fuzz TUIs

FastCGI: 30 years old and still the better protocol for reverse proxies (via https://news.ycombinator.com/item?id=47950510)
Don't care about FastCGI, but it talked about HTTP Desync Attack and Untrusted Headers, which is interesting

There is no explicit framing of HTTP messages - the message itself describes where it ends, and there are multiple ways for a message to do that, all with their own edge cases. Implementations can disagree about where a message ends, and consequently, where the next message begins.
HTTP/2, when consistently used between the proxy and backend, fixes desync by putting clear boundaries around messages.
…
HTTP has no robust way for the proxy to convey trusted information about the request, such as the real client IP address, authenticated username (if the proxy handles authentication), or client certificate details (if mTLS is used).
The only option is to stick this information in HTTP headers, alongside the headers proxied from the client, without a clear structural distinction between trusted headers from the proxy and untrusted headers from a potential attacker. For example, the X-Real-IP header is often used to convey the client's real IP address. In theory, if your proxy correctly deletes all instances of the X-Real-IP header (not just the first, and including case variations like x-REaL-ip) before adding its own, you're safe.
In practice, this is a minefield and there are an awful lot of ways your backend can end up trusting attacker-controlled data. Your proxy really needs to delete not just X-Real-IP, but any header that's used for this sort of thing, just in case some part of your stack relies on it without your knowledge. For example, the Chi middleware determines the client's real IP address by looking at the True-Client-IP header first. Only if True-Client-IP doesn't exist does it use X-Real-IP. So even if your proxy does the right thing with X-Real-IP, you can still be pwned by an attacker sending a True-Client-IP header.

Further reading:

https://portswigger.net/research/http-desync-attacks-request-smuggling-reborn

A B300 server (=8 B300s) is only 2x the price in China vs in the US (1M vs 0.55M) according to Reuters. (via https://www.cna.com.tw/news/aopl/202604300273.aspx via Facebook)
This is a surprisingly low premium.

Shatter resistance and scratch resistance are inversely related
https://www.youtube.com/watch?v=7YrdI7h2XoY

WhatCable, a tiny macOS menu bar app for inspecting USB-C cables

https://news.ycombinator.com/item?id=47972798
As an English-as-second-language speaker and writer, one thing Grok really shines at is capturing the tone and level of "formality" of a piece of text and the replicating it correctly. It seems to understand the little human subtleties of language in a way the other major providers don't. Chatgpt goes overly stiff and formal sounding, or ends up in a weird "aye guvnor" type informal language (Claude is sometimes better but not always).

How Mark Klein told the EFF about Room 641A [book excerpt] (via https://news.ycombinator.com/item?id=47965060)

With all due respect, Cindy, you don’t know if they are classified since they don’t have to have markings and can still be classified. Only we can tell. And if they are classified, you are likely in trouble.

How an Oil Refinery Works (via https://news.ycombinator.com/item?id=47962548)
Comments:

A few corrections. Credentials: I am a Chemical Engineer in a Senior Tecnical Leadership position at a refinery with over thirty years of experience. (plus the parent comment)

California doesn't build pipelines so is entirely dependent on seaborne oil imports (~75%) despite the US being a net energy exporter. Last I checked, ~20% of that foreign oil comes through the Strait (from Iraq, mostly) so, interestingly, CA is more vulnerable to the Strait of Hormuz closure than the rest of the country.
There is some out of date information here. California is a net importer of gasoline since refinery closures in California have outpaced reduced demand from increased fleet fuel efficiency / BEV adoption. There are refineries in Asia that export California and some other US refineries can also make California grade gasoline but this requires shipping via the Panama Canal on Jones act ships that are scarce and expensive.

^https://www.thenewatlantis.com/collections/how-the-system-works (via a LW linkpost that I couldn't find)
An essay series on the hidden mechanisms that support modern life — and what happens if we don’t maintain them

Farming
Water
Electricity

^https://www.cs.virginia.edu/~robins/YouAndYourResearch.html
(Extremely hard to read on desktop - either edit the html to add <meta name="viewport" content="width=device-width,initial-scale=1"> and use the devtools to make the viewport less wide, or read on mobile)

Wow is this talk where all the great ideas come from

So the way to manage yourself is that when you have a real important problem you don't let anything else get the center of your attention - you keep your thoughts on the problem. Keep your subconscious starved so it has to work on your problem, so you can sleep peacefully and get the answer in the morning, free.

The most important thing in your field troll
The 10-20 most important problems looking for an attack
working with the door open

I have now come down to a topic which is very distasteful; it is not sufficient to do a job, you have to sell it. 'Selling' to a scientist is an awkward thing to do. It's very ugly; you shouldn't have to do it. The world is supposed to be waiting, and when you do something great, they should rush out and welcome it. But the fact is everyone is busy with their own work. You must present it so well that they will set aside what they are doing, look at what you've done, read it, and come back and say, "Yes, that was good.'' I suggest that when you open a journal, as you turn the pages, you ask why you read some articles and not others. You had better write your report so when it is published in the Physical Review, or wherever else you want it, as the readers are turning the pages they won't just turn your pages but they will stop and read yours. If they don't stop and read it, you won't get credit.

Question: Is brainstorming a daily process?
Hamming: Once that was a very popular thing, but it seems not to have paid off. For myself I find it desirable to talk to other people; but a session of brainstorming is seldom worthwhile. I do go in to strictly talk to somebody and say, "Look, I think there has to be something here. Here's what I think I see ...'' and then begin talking back and forth. But you want to pick capable people. To use another analogy, you know the idea called the 'critical mass.' If you have enough stuff you have critical mass. There is also the idea I used to call sound absorbers'. When you get too many sound absorbers, you give out an idea and they merely say, "Yes, yes, yes.'' What you want to do is get that critical mass in action; "Yes, that reminds me of so and so,'' or, "Have you thought about that or this?'' When you talk to other people, you want to get rid of those sound absorbers who are nice people but merely say, "Oh yes,'' and to find those who will stimulate you right back.

https://blog.andymasley.com/p/data-center-land-use-issues-are-fake (via https://simonwillison.net/2026/May/4/andy-masley/)
Probably what I'll send to my friends if they ever said anything about AI data centers using up a lot of land

https://www.theguardian.com/society/2026/mar/30/court-of-appeal-says-it-cannot-rule-on-which-identical-twin-fathered-a-child (via https://www.lesswrong.com/posts/fP4iMX8ziGWWWHf2f/april-2026-links)
A woman who had sex with identical twins within four days of each other is unable to ensure one of them takes parental responsibility because it is “not possible” to know which is the father, the court of appeal has said.

Canadian election databases use “canary traps”—and they work
https://arstechnica.com/tech-policy/2026/05/in-canada-a-canary-trap-springs-shut-and-ids-election-database-leak/

Talking to 35 Strangers at the Gym (via https://news.ycombinator.com/item?id=48007438)

OpenAI’s o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors (via https://news.ycombinator.com/item?id=47991981)
hereme888: Hyped title. It was exclusively text-based diagnosis after physicians did the whole interview, exam, labs, etc.

Also, later in the encounter, with more chart information, AI scored 82%, physicians 70–79%; that difference was reportedly not statistically significant.

https://tweedegolf.nl/en/blog/235/debloat-your-async-rust (via https://news.ycombinator.com/item?id=48019163)

Shell / Terminal tricks

https://reflex.dev/blog/computer-use-is-45x-more-expensive-than-structured-apis/ (via https://news.ycombinator.com/item?id=48024859)

https://peps.python.org/pep-0789/
Very interesting highly compressed explanation of why the category of weird task cancelling bugs in async generators exists. I needed to ask AI for explanation for basically every example given, but at the end I felt like I learned a lot.

https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/ (via https://peps.python.org/pep-0789/)

Go statements break abstraction. Remember how we said that if our language allows goto, then any function might be a goto in disguise? In most concurrency frameworks, go statements cause the exact same problem: whenever you call a function, it might or might not spawn some background task. The function seemed to return, but is it still running in the background? There's no way to know without reading all its source code, transitively. When will it finish? Hard to say. If you have go statements, then functions are no longer black boxes with respect to control flow. In my first post on concurrency APIs, I called this "violating causality", and found that it was the root cause of many common, real-world issues in programs using asyncio and Twisted, like problems with backpressure, problems with shutting down properly, and so forth.

The Hidden Math Behind 3D Eye Tracking: a basic explanation of how 3D eye tracking works

https://deepmind.google/blog/alphaevolve-impact/ (via https://www.lesswrong.com/posts/iDiw3o8AyfmZfZ3hY/ben-livengood-s-shortform?commentId=TsCY9rrFBLmBQsjPf)

"AlphaEvolve began optimizing the lowest levels of hardware powering our AI stacks. It proposed a circuit design so counterintuitive yet efficient that it was integrated directly into the silicon of our next-generation TPUs. This is the latest example of TPU brains helping design next-generation TPU bodies.” — Jeff Dean, Chief Scientist, Google DeepMind and Google Research

https://en.wikipedia.org/wiki/Virus_classification#Subviral_agents: "Subviral agents are smaller than viruses and have only some of their properties". I had always knew there are "living" things smaller than a virus, but it was fun reading the different types.

Saw this while clicking around wiki when I'm searching about Hantavirus

https://journals.sagepub.com/doi/10.1177/00222437251381209 (via https://statmodeling.stat.columbia.edu/2026/05/08/the-pick-the-winner-picker-heuristic-preference-for-categorically-correct-forecasts/)

Experts characterize good forecasting as minimization of continuous error (i.e., predictions close to the eventual outcome). By contrast, the present work reveals that laypeople typically see good forecasts as those that correctly predict an event’s categorical outcome (e.g., the winning team). […] Thus, in the common case when the categorical dimension matters most (e.g., sports contests), people prize forecasts that accurately predicted the categorical outcome (e.g., the winner, not the margin of victory).

https://www.cremieux.xyz/p/do-teachers-need-advanced-degrees
Teachers having master's degree doesn't affect test score
Teacher experience matters (± 2pts)

https://fourlightyears.blogspot.com/2026/05/i-returned-to-aws-and-was-reminded-hard.html (via https://news.ycombinator.com/item?id=48073201)
Short list of various issues with Amazon Web Service

https://arstechnica.com/gadgets/2026/05/starlink-blocks-access-to-its-gps-alternative-ahead-of-spacex-ipo/
Starlink can act as an alternative GPS

https://simonwillison.net/2026/May/11/learning-on-the-shop-floor/

River does not respond to direct messages. She politely declines and suggests to create a public channel for you and her to start working in. I myself work with river in #tobi_river channel and many followed this pattern. Every conversation is therefore searchable. Anyone at Shopify can jump in. In my own channel, there are over 100 people who, react to threads, add color and add context, pick up the torch, help with the reviews, remind me how rusty I am, and importantly, learn from watching. [...]
As so often with German, there is a word for the kind of environment: Lehrwerkstatt. Literally: A teaching workshop. The whole shop floor is the classroom. You learn by being near the work. Being a constant learner is one of the core values of the firm.
Shopify wants to be a Lehrwerkstatt at scale and River has now gotten us closer to this ideal than ever. It’s osmosis learning, because it does not require a curriculum, a training plan, or a manager. It just requires everyone's work to be visible to the maximum extent possible. Everyone learns from each other.

I'm reminded of how Midjourney spent its first few years with the primary interface being public Discord channels, forcing users to share their prompts and learn from each other's experiments. I continue to believe that the early success of Midjourney was tied to this mechanism, helping to compensate for how weird and finicky text-to-image prompting is.

https://www.seangoedecke.com/space-ai-datacenters-do-not-have-a-cooling-problem/

we’d need 250,000 square metres of radiation area. The largest current radiator in space is probably the ISS, at around a thousand square metres. Is scaling that up by 250x a lot? Yes, but it’s not necessarily ridiculous.

https://a11ymyths.com/ I still mostly don't care about accessibility but it is a good read. Apparently Netflix and Domino got sued for not being accessible?

https://www.theverge.com/ai-artificial-intelligence/931006/musk-v-altman-closing-arguments-analysis
Lots of interesting evidences that are used in Musk v OpenAI

https://csswizardry.com/2023/09/the-ultimate-lqip-lcp-technique/
Low Quality Image Placeholder

https://arstechnica.com/gadgets/2026/05/russia-pressures-university-students-to-become-wartime-drone-pilots/

Russian universities are promising free tuition and up to $70,000 to students who are willing to serve as drone pilots in the Russian military for a year

https://news.ycombinator.com/item?id=48120629
For the past days I've been participating(albeit over Teams) in a conference relevant to my industry (intel), basically startups and established companies showcasing their products to a closed audience of EU gov. officials.

One thing I noticed right away, is that all companies were asked "Can we fully host this from within EU or our country" from the various people in audience. Every single one. Many of the startups had slides prepared for this.

Definitely a change, because it is not something I can recall being important just a couple of years ago.

https://arstechnica.com/science/2026/05/running-the-numbers-on-a-zero-emission-way-to-make-cement/
Cement production alone currently accounts for about 8 percent of global CO2 emissions, so considerable effort is going into lowering that number. Efficiency can be increased, and energy sources can be swapped for cleaner ones, but a stubborn reality remains: The byproduct of turning limestone into lime during cement production releases CO2 gas. These “direct process emissions” are actually slightly larger than the emissions from burning fuel to heat the kilns and drive this process.

^ https://arstechnica.com/science/2026/05/solar-power-production-undercut-by-coal-pollution/
new study suggests that the problems with coal-derived pollution go beyond health; it interferes with other power sources. Researchers have found that aerosols, both natural and human-derived, significantly reduce the power we could be getting from solar panels, to the tune of hundreds of terawatts a year. And a lot of those aerosols come from burning coal.

In 2023, for example, over a quarter of the potential solar power production was lost, with over 20 percent due to clouds and another 6 percent from aerosols.

^ The US Is Using AI to Hunt Down Insider Trading on Polymarket
In April, after significant backlash over suspected insider trading, Polymarket announced its own partnership with Chainalysis. It was part of a broader push to crack down on market manipulation. While the company’s CEO, Shayne Coplan, had talked in the past about why insider trading could be good for prediction markets, Polymarket changed its approach this spring, updating its market integrity rules and announcing a partnership with Palantir for its US-based sports markets

How I use LLMs as a staff engineer in 2026

https://www.theargumentmag.com/p/the-meritocracy-of-circadian-rhythms
In 2019, California passed SB 328, requiring high schools to start no earlier than 8:30 a.m. and middle schools no earlier than 8 a.m. That is, a bill to let kids sleep in. And they did, according to a promising new study — roughly 46 more minutes a night.

Cuba runs out of diesel and fuel oil

Donald Trump turns up the heat on Cuba

Bun's problem may be developing in the open Good take. People are overreacting.

Zvi

AI 166

https://www.lesswrong.com/posts/zGyzyJJkTxbhReZP7/ai-166-google-sells-out

^More confirmation of successful stylometry by LLMs: Megan McArdle confirms that Opus 4.7 can unmask writers

Even 0.5% additional productivity growth from AI would stabilize the American debt-to-GDP ratio on its own, despite all our otherwise reckless overspending, because in addition to direct GDP expansion it drives down bond yields. Even 0.1% additional productivity growth reduces nominal Treasury yields by 70bps (0.7%).
I think this dangerously conflates market expectations of productivity growth with actual productivity growth, and I haven’t checked their math, but the basic principle is correct that it only takes expectations of what are basically ‘baked in’ levels of productivity growth to stabilize our debt in the medium term in ‘AI as normal technology’ worlds.

Housing Roundup 13

Connor O’Brien
In the American Community Survey, the median age of heads of households who are 1) homeowners and 2) moved in within the last year is 41.

Childhood And Education #17: Is Our Children Reading

https://www.lesswrong.com/posts/dm2vQZPZcSKb8FhWw/childhood-and-education-17-is-our-children-reading

Black students are as likely to be basic-or-above readers in Mississippi (where the median Black household income was $37,900 in 2023) as in national top performer Massachusetts (where the median Black household income was $67,000 in 2022.)

The states adopted reading curricula backed by actual scientific research. This led to them adopting phonics-based early literacy programs and rejecting ones that used the debunked “whole language” method that encourages students to vaguely guess at words based on context instead of figuring them out sound-by-sound.
This is the part of the story that has gotten the most attention — teach phonics! And you should, indeed, teach phonics. But making schools adopt the approach took more than a mere nudge. The Southern Surge states have tried earmarked funding, guidance to districts, and outright mandates to accomplish universal adoption.
… The second pillar, White told me, is “a scaled system of training those teachers on that curriculum — most teaching you get as a teacher is not training on the curriculum.”
… The third pillar is everyone’s least favorite, but it’s equally crucial. “Number three is clear accountability at the district level, at the school level, at the educator level, and at the student and parent level,” White said.
… In Mississippi, a child who isn’t capable of reading at the end of third grade has to repeat the grade — a policy called third grade retention. Alabama and Tennessee have implemented it too. Research has found that third grade retention doesn’t harm students in non-academic ways and tends to help them academically — but, of course, it’s upsetting for kids, frustrating for families, and unpleasant for educators. Unfortunately, that’s probably part of why it works.

AI 168

https://www.lesswrong.com/posts/E4uEiMSpnYRotfzJX/ai-168-not-leading-the-future

A Nature meta analysis of AI learning studies, that claimed ChatGPT could benefit students, has been retracted due to discrepancies and concerns about the quality of the included studies.

Monthly Roundup 42

https://www.lesswrong.com/posts/gZHNLmHkQ7GjnWsYh/monthly-roundup-42-may-2026
Oliver Habryka: In short: If you want a space to feel natural, buy lightbulbs with at least 95 CRI, ideally 98.

… If you are lighting a room with plenty of natural light, just use 2000K-3000K lights.

… The world got ugly when we invented LEDs.

If you are on a trip abroad and you are asked in what currency you wish to pay, it is almost always cheaper to pay with your card in local currency than in dollars.

This is a great idea:

shako: substack should have a button on paid articles, enabled as an option by authors, that lets a benefactor “buy it for everyone.” e.g. a rich benefactor may think it to be worth $500 or some such amount to make an article behind a paywall free for everyone.

LessWrong

What do Russian olympiad winners think of HPMOR? Our data

https://www.lesswrong.com/posts/3iPQJd3LdHEMZvcHa/fessus-s-shortform?commentId=zTZ5WCoKELHoA8jfN
The eventual success of rationalists will result in the death of the label "rationalists." Take "liberalism" as an example. In the eighteenth century, liberals were a specific political group that opposed absolute monarchy (I'm oversimplifying a bit). Now, mostly everyone opposes absolute monarchy. The "liberal" label has come to refer to left-wing instead.

o3 has been used for nuclear weapons research

https://www.lesswrong.com/posts/CBTe8Etwb9wdjbpZC/returns-to-intelligence (via https://www.lesswrong.com/posts/fP4iMX8ziGWWWHf2f/april-2026-links)
Assembling a 4x4x4 cube with 6 chiral pentacubes + 2 dominos in the head using 3 hours + sleep

^ Bad Problems Don't Stop Being Bad Because Somebody's Wrong About Fault Analysis

Here's a dynamic I’ve seen at least a dozen times:
Alice: Man that article has a very inaccurate/misleading/horrifying headline.
Bob: Did you know, actually article writers don't write their own headlines?
…
But what I care about is the misleading headline, not your org chart

^https://www.lesswrong.com/posts/3LcyoqNTJuCZ65MbL/mo-putera-s-shortform?commentId=TfeGhHBh35rffdo6W
Fwiw I'm >9:1 confident that I'm Kenyan. I Don't Write Like ChatGPT. ChatGPT Writes Like Me is written with substantial AI assistance. It gets 100% on Pangram and the tone and rhythm just seems way too AI-like. Maybe you're like "oh doesn't that just prove his point? But no.

^ A Year Late, Claude Finally Beats Pokémon

^https://www.lesswrong.com/posts/eFD3rozNCZKMe4rTs/mats-9-retrospective-and-advice

I made it my goal to know every fellow who was in-person in Berkeley, and I think I succeeded? Or at least, after week 2 I stopped seeing new faces around and by week 3 I think I had everyone’s names memorised and had had a small chat with just about everyone. This was really good, strongly recommend. Even if you’re not a “social person”, there’s a big difference between socialising all the time and knowing the fellows well enough that you can spot them in a crowded room.

My teammate and I did these via google slides, and largely followed the advice here. Some tips are harder to do, but a couple that are definitely worthwhile and easy to implement:
Have one set of slides that you prepend to each week, instead of multiple slide decks (so you can reference previous work if it comes up).
Don’t try to cram many things on one slide, just use dozens of slides.

my dotfiles are here, get your agent to have a look

dataset generation: Claude is very bad at getting another LLM to generate coherent, high quality datasets and then auditing the dataset for quality. Claude will claim a dataset is good, but manual checking reveals the dataset to be not strongly exhibiting the trait we care about, or it is too “meta” (e.g. the response talks about exhibiting the trait instead of actually just exhibiting the trait, and Claude doesn’t pick this up), or Claude will do regex to check for plausible words associated with the trait of interest but not actually read the responses. This is made worse because Claude will talk as though it had done a thorough audit even though it had vaguely glanced in the right direction.
“double blind” reviews and scientific thinking: Claude doesn’t have a good practical understanding of scientific thinking in a way that’s hard to describe. Often, I asked it to spawn sub-agents to “blindly” review two datasets (one exhibiting some behaviour, one not exhibiting some behaviour) and ask the subagent to guess what the behaviour was. Claude would frequently put the name of the behaviour in the file name and ask the subagent “what behaviour is systematically expressed in examples-of-sycophancy.jsonl” (I’m not exaggerating). Claude will very happily talk about the mechanisms of blind review, but does not reliably implement them, and frequently reports that it has done a blind review despite this. There are other ways in which Claude doesn’t “think scientifically”, and will very happily cherry-pick results, make claims that are not backed up by the experimental results, fail to find (fairly simple) flaws in an experimental setup, disregard negative evidence as noise, put big green emojis next to trivially-true things (“dataset has 500 rows✅”) and hide or not equally emphasise problems (“dataset only containstags, no responses”).
repetitive tasks: Claude is very lazy at repetitive tasks that can’t be programmatically done via a tool (even if that tool is another LLM). e.g. “check these 150 responses for sycophantic behaviour” or “evaluate which of these 20 responses are the most pirate-like”. I’ve gotten around this by giving Claude an OpenRouter key and getting it to write a script to call another LLM, but even then Claude will often “spot check” <10% of items and call it a day.

what does the prior work on intentional practice for research taste look like?

https://www.lesswrong.com/posts/7mnQixaWC747dm76h/tomas-b-s-shortform?commentId=orcAwSLXRPayb5CTc
Richard tests his audience to see if they can tell the difference between his writing and Claude's. They mostly can't.

Linkposts

^https://slimemoldtimemold.com/2026/04/30/links-for-april-2026/

wtf Inkhaven is sponsored by WordPress.com (apparently different from WordPress)
Is slimemoldtimemold a LW user?

^https://www.lesswrong.com/posts/fP4iMX8ziGWWWHf2f/april-2026-links

How to take a perfect dating photo: I largely agree with the technical details of this post, but have two small, somewhat-related thoughts. First, I don’t think Photofeeler is necessarily as good as people make it out to be. I’m happy to be proven wrong. Photofeeler, as far as I can tell, doesn’t default you to the opposite sex when voting on dating photos (or I just set that setting years ago and forgot), nor can I find a setting that allows me to only allow women to vote on my dating photos. I also just went through about 20 votes, counting 15 men and 5 women. This sounds about right: men are probably more likely to use this app because they’re male-brained, maybe a bit more analytical with numbers and optimization, women don’t want to get creeped on on there, etc. So in the end, if my percentages hold, 75% of votes on my pictures are from a certain subset of men rating my photos and 25% from a certain subset of women. I don’t know if I necessarily trust these results? After all, they are the same people who are asking for advice on their photos! If they knew what a good picture looked like, wouldn’t they just take one themselves? I would much rather be able to directly access my target demographic and have them rate my pictures (or have Photofeeler reveal the demographics behind each vote) than trust some random people on the internet. That said, maybe there’s some prediction market/wisdom of the crowd thing going on that is helpful? Disclaimer: I got an 8.5, 8.9, 9.2 STA score on a picture and it was by far my most liked on a dating app, so maybe there is something there. Further, I think there’s an incentive to just quickly click through buttons to maximize credits per unit time. I’m not sure if they are able to detect this? Like you said, it may weight their votes less than someone who varies their scores, but I think there may still be an impact. Second, I think there can something icky about dating photos if done improperly or they’re by themselves in that they (can) signal that your life isn’t interesting enough to have candid photos and you have to resort to dressing up, posing, lighting, and angles to sell yourself as a package. Sure, you may not be the picture-taking type, but I’d argue if your life is interesting enough, you’d want to get pictures on the reg of the cool, fun shit you’re doing, or you’d simply adjust your habits to take more pictures. I think there’s more nuance to these arguments than I let on, but they feel directionally correct.

https://www.lesswrong.com/posts/cAfAayy9k2H25iBHx/sarahconstantin-s-shortform?commentId=GLFywTzE2subM9icF

^https://www.lesswrong.com/posts/cAfAayy9k2H25iBHx/sarahconstantin-s-shortform?commentId=xmuTkicreRCSPsFq7
A couple links about robotics

https://academic.oup.com/jeea/article-abstract/19/6/3104/6179884 as of 2021, robots do not reduce manufacturing employment in Germany; they shift it towards programming roles.
https://www.aiprm.com/robotics-statistics/ half of robots are used in "handling". (such as pick & place.)
https://www.engineering.com/the-what-why-and-how-of-roboforming/ roboforming deforms a sheet of metal with a robot arm and steel ball; it's a tooling-free replacement for stamping.
https://www2.census.gov/library/working-papers/2023/adrm/ces/CES-WP-23-14.pdf where robots are adopted