How does it work to optimize for realistic goals in physical environments of which you yourself are a part? E.g. humans and robots in the real world, and not humans and AIs playing video games in virtual worlds where the player not part of the environment.  The authors claim we don't actually have a good theoretical understanding of this and explore four specific ways that we don't understand this process.

Recent Discussion

You are invited to participate in Metaculus's FRO-Casting Tournament, an exciting pilot project in partnership with The Federation of American Scientists that harnesses predictions to help assess impact, deliver feedback, and inform the allocation of $50 million to ambitious research proposals handpicked by Convergent Research

There is a wealth of untapped scientific knowledge at the intersection of research and engineering: Industry doesn't pursue it because it's unlikely to be profitable. Academia doesn't pursue it because it's unlikely to be publishable. Enter Focused Research Organizations (FROs), non-profits modeled after startups that address well-defined technology challenges that are unlikely to be solved by industry or academia. 

By sharing your insights, you'll join forces with Metaculus Pro Forecasters and 25 subject matter experts to help generate: 

  • Risk-reward profiles of each FRO proposal
  • Actionable feedback

This is Part I in a series on easy weightloss without any need for will power. 

The Origin: listening to the dark corners of the internet

Losing weight is supposed to be really hard and require a lot of willpower according to conventional wisdom. It turns out that it was actually really easy for me to go from a BMI of above 29 (30 is officially obese) to below 25 (normal is 18 to 25) in 3½ months. And knowing what I know now, I think I could easily do it again in a 1½ month. 

I'm not someone who ever tried dieting before. Dieting sounded like a lot of effort and willpower for very uncertain results. Not a good use of my very limited willpower. This belief changed after...

If you shifted a large portion of your diet to potatoes, which are only 2 % protein, unless you compensated for it actively with protein elsewhere through further shifts in your diet, I think muscle loss playing a role in the weight loss you observed is not implausible. If one had, say, 2,4 kg of potatoes a day (that would come to 1750 kcal, which is compatible with its use as a sole food while losing weight), one would only be getting 48 g of protein a day, while at a caloric deficit - I'd expect muscle loss with those values. And indeed, if you had maintained muscle mass, body weight exercises would have gotten markedly easier. Muscles weigh more than fat, too, so the loss shows quite a bit on the scale, and hence, may make up a significant proportion of the loss you observed. Also, as a German, I strongly protest the notion of the poster above you that potatoes are not tasty or varied. There are over 3000 potato breeds, covering all sorts of colours (white, yellow, orange, red, pink, purple...), shapes, consistencies (festkochend (with bite, e.g. great for fried potatoes), mehlig (creamy, e.g. if you want to mash them), vorwiegend festkochend (an interim)) and tastes, from sweet, fruity and subtle to intense and hearty with earthy, aromatic and nut-like notes. A good potato cultivar, harvested fresh or at least stored correctly, is delicious, needing nothing but a bit of salt and maybe a hint of fat. We've had outright campaigns to keep particularly tasty cultivars on the market (e.g. Linda), and German farmers traditionally name their potato cultivars they are proudest of for their wives. American potatoes bred to look pretty and become huge may be bland, but good potatoes really are not. They should be an excellent stand-alone, and enrich any dish they are added to. - And now I crave potatoes.
Yes, preservation via fermentation is typically achieved by putting your thing-to-be-preserved into salt in an oxygen restricted environment, which leads to selective bacterial activity dropping the ph and hence further restricting undesired bacterial activity, while boosting beneficial bacteria, breaking down anti-nutrients, and having all sorts of beneficial health effects. Which is why I rejected the idea "pickles are more vinegar than salt", insofar as your sole necessary starting base is salt, with the acidity a later result, and vinegar generally only the end-stage product that is often not even reached, and hence often not characteristic - sauerkraut is indeed made out of cabbage, salt, and water only, and the only acid in there is one produced by bacteria, and the predominant acid we target is lactic acid. Hence high salt consumption. - The idea that pickles are just "vegetable plus vinegar" is basically a modern invention - because most pickles you get in stores aren't actually pickles, they weren't fermented, they aren't probiotics, their antinutrients are untouched, and they needed to be sterilised (vitamin loss) or had preservatives added (bad for microbiome) to remain stable - they are just supposed to taste a bit of actual pickles due to the vinegar, and are hence easier, faster and more reliable to make. Actual pickles are basically vegetable plus salt plus time. Which acids you get depends on a number of factors, like the temperature you keep it at, whether you are working with mixed bacteria, fungi, or combinations thereof (sourdough, kombucha...), and what your starting ingredients are, and how long you ferment without adding more raw material. E.g. my sourdough will alternate between being dominated by lactic acid, and dominated by acetic acid, depending on how cold I keep it (fridge or outside) and how much I feed it (daily or less often), which leads to dominance of different groups of bacteria or fungi, and also has them either metabolising ra
Did you count calories? Did you try to keep the same amount of calories of the replaced meals, but with potatoes?

That would only be meaningful if OP had accurately weighed and tracked the food, which is enough of a hassle that this would have been mentioned, I think. And without it... you would naturally assume that OP consumed fewer calories, because a significant part of their diet was now a highly satiating low calorie food with resistant starch. That would definitely be my guess.

Knut Wicksell - Wikipedia
Knut Wicksell (1851-1926)

“Money is a machine whose function is to do quickly and conveniently what would be done, though less quickly and conveniently, without it” — J.S Mill

An Introduction to an Ignored Problem:

It is quite common that in undergraduate economics course, particularly in Money and Banking, students are taught that money is adequately defined as a means of general exchange between economic agents and that it is an institution created for the purpose of eliminating the inconveniences of barter.

However, the student rarely asks himself the reason for this, much less observes that the logic given to him is wrong. What does it ultimately mean for currency to be a general medium of exchange? And why is barter a less desirable option than indirect exchange using money?


Thus, in an economy that has 100 goods, for example, there would be a total of 4950 prices or “exchange rates” between one good and the others.

Actually, this can be avoided by making the prices virtual and having a liquidity pool that would automatically allow to calculate prices. Liquidity pools can almost surely extend to more than 2 goods while still having well-defined .

(Actually, liquidity pool exchanges can be used even without computers; they don't present complicated expressions unless someone wants to add goods to pool.)

FYI, a fraction of your LaTeX code didn't render properly.

Confidence level: I’m a computational physicist working on nanoscale simulations, so I have some understanding of most of the things discussed here, but I am not specifically an expert on the topics covered, so I can’t promise perfect accuracy.

I want to give a huge thanks to Professor Phillip Moriarty of the university of Nottingham for answering my questions about the experimental side of mechanosynthesis research.


A lot of people are highly concerned that a malevolent AI or insane human will, in the near future, set out to destroy humanity. If such an entity wanted to be absolutely sure they would succeed, what method would they use? Nuclear war? Pandemics?

According to some in the x-risk community, the answer is this: The AI will invent molecular nanotechnology, and then kill...

Possibly, but by limiting access to the arguments, you also limit the public case for it and engagement by skeptics. The views within the area will also probably further reflect self-selection for credulousness and deference over skepticism.

There must be less infohazardous arguments we can engage with. Or, maybe zero-knowledge proofs are somehow applicable. Or, we can select a mutually trusted skeptic (or set of skeptics) with relevant expertise to engage privately. Or, legally binding contracts to prevent sharing.

2Shankar Sivarajan15h
I think the chess analogy is better: if I predict that, from some specific position, MacGyver will play some sequence of ten moves that will leave him winning, and then try to demonstrate that by playing from that position and losing, would you update at all?

It never stops. I’m increasingly building distinct roundups for various topics, in particular I’m splitting medical and health news out. Let’s get to the rest of it.

Bad News

A simple model of why everything sucks: It is all optimized almost entirely for the marginal user, who the post calls Marl. Marl hates when there are extra buttons on the screen or any bit of complexity is offered, even when he is under zero obligation to use it or care, let alone being asked to think, so everything gets dumbed down.

Could companies really be this stupid, so eager to chase the marginal user a little bit more that they cripple the functionality of their products? Very much so, yes, well past the point where it makes financial sense to...

Pradyumna: You a reasonable person: the city should encourage carpooling to reduce congestion

Bengaluru’s Transport Department (a very stable genius): Taxi drivers complained and so we will ban carpooling


It's not really that Bangalore banned carpooling, they required licenses for ridesharing apps. Maybe that's a de facto ban of those apps, but that's a far cry from banning carpooling in general.


Something must be done about the "Something must be done. Therefore, we must do it." paragraph. Therefore, you must do it. You'll never be rid of the Dane.


Effective altruism prides itself on truthseeking. That pride is justified in the sense that EA is better at truthseeking than most members of its reference category, and unjustified in that it is far from meeting its own standards. We’ve already seen dire consequences of the inability to detect bad actors who deflect investigation into potential problems, but by its nature you can never be sure you’ve found all the damage done by epistemic obfuscation because the point is to be self-cloaking. 

My concern here is for the underlying dynamics of  EA’s weak epistemic immune system, not any one instance. But we can’t analyze the problem without real examples, so individual instances need to be talked about. Worse, the examples that are easiest to understand are almost by definition...

I would prefer anchoring on studies that report objective clinical outcomes 

Yeah, that does sound nicer; have those already been done or are we going to have to wait for them?

So I haven't reread to figure out an opinion on most of this, but wrt this specific point I kinda want to flag something like "yes, that's the point"? If Martín's position is hard to pin down, then... like, it's better to say "I don't know what he's trying to say" than "he's trying to say [concrete thing he's not trying to say]", but both of them seem like they fit for the purposes of this post. (And if Elizabeth had said "I don't know what he's trying to say" then I anticipate three different commenters giving four different explanations of what Martín had obviously been saying.) And, part of the point here is "it is very hard to talk about this kind of thing". And I think that if the response to this post is a bunch of "gotcha! You said this comment was bad in one particular way, but it's actually bad in an interestingly different way", that kinda feels like it proves Elizabeth right? But also I do want there to be space for that kind of thing, so uh. Idk. I think if I was making a comment like that I'd try to explicitly flag it as "not a crux, feel free to ignore".
2Matthew Barnett10h
I don't think that's the central question here. We were mostly talking about whether vegan diets are healthy. I argued that self-reported data is not reliable for answering this question. The self-reported data might provide reliable evidence regarding people's motives for abandoning vegan diets, but it doesn't reliably inform us whether vegan diets are healthy. Analogously, a survey of healing crystal buyers doesn't reliably tell us whether healing crystals improve health. Even if such a survey is useful for explaining motives, it's clearly less valuable than an RCT when it comes to the important question of whether they actually work.
1Stephen Bennett42m
So far as I can tell, the central question Elizabeth has been trying to answer is "Do the people who convert to veganism because they get involved in EA have systemic health problems?" Those health problems might be easily solvable with supplementation (Great!), systemic to having a fully vegan diet but only requires some modest amount of animal product, or something more complicated. She has several self-reported people coming to her saying they tried veganism, had health problems, and stopped. So, "At what rate do vegans desist for health reasons?" seems like an important question to me. It will tell you at least some of what you are missing when surveying current vegans only. I agree that if your prior probability of something being true is near 0, you need very strong evidence to update. Was your prior probability that someone would desist from the vegan diet for health reasons actually that low? If not, why is the crystal healing metaphor analogous?
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

1Martin Randall3h
You probably saw Petrov Day Retrospective, 2023 by now.

I did, thanks :)

3Martin Randall3h
Discussion on Manifold Discord is that this doesn't work if traders can communicate and trade with each other directly. This makes it not real world applicable.
2Yoav Ravid37m
Thanks for mentioning it! I joined the discord to look at the discussion. It was posted three separate times, and it seems that it's been dismissed out of hand without much effort to understand it. First time it was posted it was pretty much ignored. Second time it was dismissed without any discussion. Third time someone said that they believe they discussed it already, and Jack added this comment I'm not sure how true this is, and if it is, how bad it would actually be in practice (which is why it's worth testing empirically), but I'll ask the author for his thoughts on this and share his response. I've already had some back and forth with him about other questions I had. Some things worth noting: There's discussion there about self-resolving markets that don't use this model, like Jack's article, which aren't directly relevant here. This is the first proof of concept ever, so it makes sense that it will have a bunch of limitations, but it's plausible they can be overcome, so I wouldn't be quick to abandon it. Even if it's not good enough for fully self-resolving prediction markets, I think you could use it for "partially" self-resolving prediction markets in cases where it's uncertain if if the market is verifiable, like conditional markets and replication markets. So if you can't verify the result the market self-resolves, instead of resolving to N/A and refunding the participants. That way you have an increased incentive to participate, because you know the market will resolve either way, but it also grounds you in truth because you know it may resolve based on real events and not based on the self-resolving mechanism.

This post is a copy of the introduction of this paper on lie detection in LLMs. The Twitter Thread is here.

Authors: Lorenzo Pacchiardi, Alex J. Chan, Sören Mindermann, Ilan Moscovitz, Alexa Y. Pan, Yarin Gal, Owain Evans, Jan Brauner

Our lie dectector in meme form. Note that the elicitation questions are actually asked "in parallel" rather than sequentially: i.e. immediately after the suspected lie we can each of 10 elicitation questions. 


Large language models (LLMs) can "lie", which we define as outputting false statements despite "knowing" the truth in a demonstrable sense. LLMs might "lie", for example, when instructed to output misinformation. Here, we develop a simple lie detector that requires neither access to the LLM's activations (black-box) nor ground-truth knowledge of the fact in question. The detector works by...

I think this is a pretty wild paper. First, this technique seems really useful. The AUCs seem crazy high.

Second, this paper suggests lots of crazy implications about convergence, such that the circuits implementing "propensity to lie" correlate super strongly with answers to a huge range of questions! This would itself suggest a large amount of convergence in underlying circuitry, across model sizes and design choices and training datasets.

However, I'm not at all confident in this story yet. Possibly the real explanation could be some less grand and more spurious explanation which I have yet to imagine.

I'd like to compile a list of potential alignment targets for a sovereign superintelligent AI.

By an alignment target, I mean something like what goals/values/utility function we might want to instill in a sovereign superintelligent AI (assuming we've solved the alignment problem).

Here are some alignment targets I've come across:

Examples, reviews, critiques, and comparisons of alignment targets are welcome.

the QACI target sort-of aims to be an implementation of CEV. There's also PreDCA and UAT listed on my old list of (formal) alignment targets.