# Title: Entanglement Between an AI and Its Environment Post ID: `nacz5tFK6wJ7NFqMp` Version: `draft` Context for LLMs/AI Agents: This is a markdown translation of a draft post. You probably got here because a user shared a link to this page with you. We built this feature to help users get feedback on their posts, and to make it easier for AI agents to help users with their posts. As part of the feature, we also provide API endpoints for leaving inline comments/suggestions/etc on the post. The API endpoints are documented in the "Helping Users With Drafts" section of the Markdown API documentation. The content of the post is below, between the two horizontal rules. There may be additional horizontal rules in the post content. To help disambiguate, the post content should be followed by a "Comment Threads" section if the post has any open comment threads, and then a "Navigation" section; neither is part of the post. * * * **TL;DR:** **PLACEHOLDER.** * * * *This post is part of a* [*collection of posts about AI oversight and its limitations*](https://eval-limits.com/research). The [*previous post*](https://limits-of-evaluation.org/posts/test-creation-cost) *argued that the cost of testing powerful AI on complex tasks can become prohibitively high. This post introduces the concept underlying that argument: entanglement between the AI and its environment. However, this post can also be read standalone.* * * * Introduction: main concepts and motivation ------------------------------------------ To successfully solve a real-world task, the AI doing the solving must get some information about the environment and about its goal and have some way of affecting the environment. A question-answering oracle needs some background information, an input channel, and an output channel. A robot has sensors and actuators. A present day LLM-based agent might have access to a github repository, various tools, and a communication channel with the user. Broadly speaking, **the AI is entangled**[^av68hvdyszl] **with the environment**. ![entanglement-preview.png](https://res.cloudinary.com/lesswrong-2-0/image/upload/v1779996420/lexical_client_uploads/y3ekd11av8zfbnczdgk0.png) In this post, we want to point to a few interesting concepts that come up here. First, there is the qualitative question of how exactly is the AI connected to its environment, in terms of what are its input and output channels, which information it has access to, etc. Second, we have the quantitative question of *how strong* is the connection, or *how much information* the AI has about its environment. We provisionally refer to this as the ***actual entanglement*** *between a particular instance of AI and the environment*. Third, there is ***minimum entanglement*** *corresponding to a particular task*; a is theoretical lower bound, capturing the smallest amount of actual entanglement that some AI might have while still being able to solve the task. We are not yet confident about the best way to operationalise these concepts, so we will avoid attempting to define them formally. For example, the descriptions are suggestive of mutual information and Shannon entropy, and we agree that these are relevant concepts — but we are not confident they capture all of what we are trying to point here. Ultimately, we suspect that this notion of "entanglement" is relevant to AI safety, and in particular to the questions around AI evaluations and observation-based oversight. (How expensive we should expect good evaluations to be? What are the limitations of AI oversight?) We discuss some of these connections at the end of the post. Actual entanglement ------------------- When a specific AI is solving a specific task in a specific way, which information about the environment does the AI receive during the process? I call this *actual entanglement*. For example, suppose the task is that you want AI to help you play chess really well. One way to do it would be to have a humanoid AI-powered robot that plays the game on your behalf. Then the actual entanglement is that the robot sees the video recording of the whole game and gets sensory inputs from its body. It might also matter that the robot can walk around and choose what it looks at and interacts with.[^engw87nqonl] Very often, you can solve the exact same task with much less entanglement. In this case, maybe instead of a robot that can move around, you just have a stationary robot with a fixed camera. Probably it's enough if the camera is black and white. But you could go even further. You don't actually need a robot at all; the AI could be software-only, and you can manually send it what moves your opponent makes — pawn to e3 — and have it tell you what moves it recommends. Now the actual entanglement is just the history of moves in the game, with nothing about how the room looks. And you can go a bit further still, for example, by resetting the AI after every turn, so instead of the whole history of play, it only ever knows the current board state. ![image.png](https://res.cloudinary.com/lesswrong-2-0/image/upload/v1779998613/lexical_client_uploads/yszzlfqfjzssgryj9pzp.png) Minimum entanglement -------------------- If you try hard enough, you could probably squeeze the information even further. For example, some board positions are equivalent up to symmetry, and you could present the position in a canonical form, hiding which orientation is the real one. But at some point, the tricks run out. There will be some smallest amount of information you have to give the AI, below which it just can't do its job. *"If you want me to help you and tell you how you should play, you have to tell me* something *about what is happening in the game."* There won't be a unique format for giving this information. And maybe some bits of information are interchangeable (e.g., "you need to give the AI at least two out of these four pieces of information, but it doesn't matter which two"). But the point (or perhaps conjecture?) is that there is some reasonably well-defined boundary below which no clever trick will take you. That is what I call *minimum entanglement*: the minimum amount of information you need to give the solver if you want the task solved.[^bdii3xx5aq7] Properties ---------- Now, let's go through some of the basic properties and observations regarding how this concept behaves. ### Entanglement depends on the performance level One natural way to think about minimum entanglement is as a function of the task: "for task X, the minimum entanglement is Y." But this only works if the definition of the task already includes the performance level — for example, "play chess optimally" or "win against this particular opponent with probability >95%." If you instead think of the task as "play chess" without specifying an exact performance level, things get more interesting. Let me illustrate with a different example. Suppose the task is calculating the expenses for a particular project, and you have the information on how much each component costs. To get the exact answer, you must give the AI the exact amounts for all individual expenses — there's no way around that. But if you're willing to tolerate some error, the minimum entanglement might be lower. Maybe you're OK with an error of $1,000, in which case it might be fine to round all expenses to the nearest hundred. So the type signature is: for each task *and* each required performance level, there is some minimum entanglement. Bringing this back to chess, perhaps you could sometimes get away with giving the AI slightly incorrect information about the board state — telling it a pawn is in a different place from where it actually is. But if you do this too much, the recommendations stop working. That's why the performance level matters. ### You can't cheat by solving everything at once There is one tempting way to reduce entanglement further: instead of asking the AI how to play in your particular game, just ask it to produce a lookup table — "the optimal move for every possible board state." Now you don't need to tell the AI anything about your actual game; you just look it up. The reason this is cheating is that you have asked the AI to solve a much harder task. Instead of one particular game, you've asked it to solve all possible games, which would be vastly more expensive. I'm not sure about the best way to handle this conceptually, but maybe the right move is similar to what we did with performance: you look for the minimum entanglement needed to solve the task *without exceeding a specific computation budget*.[^2743goi1735] ### You can't cheat by reshuffling the boundary A related trick is that instead of asking the AI to tell you how to play, you ask it to give you an *algorithm* that tells you how to play. And the algorithm then does the actual work. If you do this, you reduce the entanglement between the original AI and the environment — perhaps even without increasing the computation, since the algorithm might be efficient. But this is also cheating. Yes, the entanglement between the *original* AI and the task is now small, but all the entanglement [has been reshuffled](https://www.lesswrong.com/posts/zqmAMst8hmsdJqrpR/shell-games). It now sits between the algorithm and the environment. Rather than decreasing entanglement, you have only drawn the boundary between the "AI" and the "environment" in a more convenient place.[^lxb2mph0p8d] This has practical consequences: Suppose the reason you were worried about entanglement is that an AI with too much information about the environment might try to mess with you. Unfortunately, with this reshuffling approach, you haven't solved the problem. Instead of messing with you directly, the original AI might just produce an algorithm that will try to mess with you on its behalf. And that algorithm has exactly the same amount of entanglement.[^0dbcnsx2qoa] ![image.png](https://res.cloudinary.com/lesswrong-2-0/image/upload/v1779998878/lexical_client_uploads/rvlrf5ejhyn6n1e28nka.png) If we create an AI which creates an algorithm which solves the task, this might involve little entanglement between the AI and the environment. But the entanglement between the algorithm and the environment remains. ### Minimum entanglement is about a class of tasks There is one more way you could try to cheat, and it's closely analogous to how you might try to cheat in computational complexity. In computational complexity, we might naively ask: "How hard is it to sort this particular array?" But if you pose the question exactly like this, the answer is trivial — you can always "solve" any *specific* task in constant time by using the algorithm that takes empty input and outputs the correct answer for that specific task. That's why we don't ask about the complexity of specific instances but about the complexity of a *class* of problems. The same principle applies here. If you fixed the exact task ahead of time, you could design a "solver" with zero entanglement — one that simply outputs the pre-determined correct answer without looking at the environment at all. (For example, a rock that has the correct answer written on it.) It only makes sense to ask about minimum entanglement if you have uncertainty about which task you're facing — i.e., about which instance from a class of tasks you are dealing with.[^iz0tyflkvn] Graceful vs. sharp degradation ------------------------------ With these properties in hand, the next question is: how does this concept behave for different types of tasks? I conjecture that there are tasks where performance degrades gracefully as you give the solver less information, and tasks where it degrades sharply. Some examples: **Graceful degradation.** Take the problem of vacation recommendations. Suppose I'm going to a foreign country and want suggestions for which places to visit. Specifying the "full task" would mean giving the AI all the details about my trip and about me (what I like, what I dislike, my budget, my fitness level, etc.). But for practical purposes, it's enough to say *"I'm going to country X during the tourist season, and I like hiking. Give me some reasonable tips."* It might not give me the *perfect* tips, but they'll be good enough. The AI's performance degrades slowly as we decrease entanglement, and the minimum entanglement necessary for good-enough performance is low. **Sharp degradation.** On the other hand, there are tasks where any single error in the information matters (or at least any error in some critical subset of the information does). If you have a self-driving car, giving it wrong information about the road could be catastrophic. If you want to get your taxes done, omitting some invoices or changing some numbers gives you an incorrect result — there is no "approximately correct" version of a tax return.[^yaupwmvf8s] This distinction matters because it determines how sensitive a task is to getting some of the details wrong or omitting them. With graceful degradation, I can get away with only giving the AI the most important information, and telling it to make up the rest — this won't be suspicious, since the details don't really matter that much. And as a result, creating tests for these tasks can be cheap. But if we tried the same thing with a task where performance degrades sharply, it *would* be suspicious. (*"Hey, human, what's going on here? It looks like you don't really care about getting the right answer! Is this perhaps a test?"*) Empirically, actual entanglement seems to be increasing ------------------------------------------------------- It is worth noting that in practice, the actual entanglement of AI systems has been increasing with each new generation of frontier AI. For example, the context windows keep increasing, and LLMs get access to tools and web search. They get integrated into real-world systems and databases; GitHub, email, and the editor here on LessWrong. We replace pure LLMs with LLM-based agents with scaffolding and persistent memory. And we apply AI to richer and more open-ended tasks. Each of these changes increases the amount of information about the environment that the AI can, and does, access. This matters because with higher actual entanglement, any test environment that wants to be convincing needs to involve more information. And as AI systems become more deeply embedded in real-world workflows, the gap between "what the AI sees in deployment" and "what an evaluator can afford to fake" will grow. Ultimately, we care about high-stakes tasks ------------------------------------------- Admittedly, there might be real-world tasks with low minimum entanglement. Even many of them. However, the applications that matter most to me are ultimately the ones where a powerful AI might have a genuine opportunity to take over, or to significantly increase its future influence. For example: * Helping with AI development at a frontier AI company; * Working as a personal assistant for many people simultaneously, with the ability to coordinate between instances; * Assisting the government or military in high-stakes situations; * Helping run very influential companies. How exactly minimum entanglement scales in these cases is an open question. And we could make progress on it, for example by thinking directly about how to make evaluations for these specific scenarios. However, in this post, I want to highlight a different agenda: Trying to understand the concept of minimum entanglement in general — how it works for all kinds of tasks, what are its mathematical properties, and how it behaves empirically for simpler tasks. And then using that understanding to answer the question for the high-stakes tasks we care about the most. Open questions -------------- There are several directions that seem worth pursuing: **Formalization.** What is the right mathematical definition of minimum entanglement? Mutual information between the AI's outputs and the environment state is one candidate, but I'm not confident it's the right one. It might also make sense to look at formalizations from related fields — [query complexity](https://en.wikipedia.org/wiki/Query_complexity) (how many queries to an oracle are needed to compute a function?), or [distortion in social choice theory](https://arxiv.org/abs/2103.00911) (how much does solution quality degrade when the solver gets less information about preferences?).[^giumlplalc] **Empirical measurement.** For some of the concrete tasks we are trying to solve with AI, and some of the tasks we are trying to test for: how much entanglement do they require? How much effort went into creating various test environments? And when we compare (a) minimum entanglement with (b) how much testing effort was actually used? Is this predictive of (c) how successful the tests were? Is it predictive of when AI systems seem to have or not have evaluation awareness? **Dollars as a proxy.** To start with, it might also make sense to just measure some of this in dollars or person-hours. How long do I need to spend instructing my LLM if I want it to do a task for me? How much effort goes into crafting instructions and environments for AIs? And conversely, how much money went into creating datasets, test environments, and red teaming? Is the ratio between "effort spent on real use" and "effort spent on testing" predictive of tests being informative? (For example, can you use this to predict whether the AI has evaluation awareness?) This is less theoretically satisfying than a formal definition, but it might be more immediately actionable. Connections to AI evaluation ---------------------------- While we find the notion of entanglement interesting on its own, we primarily bring it up because we think it has implications for AI oversight. As the final part of this post, we discuss four ideas on this topic. ### (1) Minimum entanglement gives a lower bound on the cost of creating environments By definition, minimum entanglement corresponds to the smallest amount of information, without which it becomes impossible to give an acceptable solution to the task. When we somebody wants to solve some real world task, they will necessarily provide the solver with this information — or else they are either making a mistake or not really trying to solve the task. In other words, most high-stakes deployment situations will give the AI at least this much entanglement. Therefore, having less than the minimum entanglement becomes a strong signal of something unusual going on — typically an evaluation, a simulation or roleplay scenario, or an interaction with a user who doesn't really care about the answer. Different AIs might react to this signal differently. For example, a scheming AI might reason about it explicitly to gain evaluation awareness. However, many AIs will instead "subconsiously" notice that they seem to have much less context than expected in a typical interaction, and start behaving off-distribution (as reported [here](https://www.lesswrong.com/posts/aTcsN5ZZDnMFJvRiG/models-may-behave-worse-when-eval-aware)). Either way, the test becomes much informative of the AI's deployment behaviour. This puts a lower bound on the complexity of any testing environment, which in turn impacts the effect (cost, resources, ...) required to create an effective evaluation. (This deserves some caveats and further discussion, some of which is in the previous post, [If This Were a Test, How Much Would It Cost?](https://www.lesswrong.com/posts/9cLQttpPTxogd2Gea/if-this-were-a-test-how-much-would-it-cost-1).) ### (2) Higher minimum entanglement means more opportunity for spotting inconsistencies All of our arguments so far would work even if the individual pieces of information that the AI gets were perfectly indepedent of each other. With higher minimum entanglement, our test environments need to be more complex, which makes them more difficult to create. However, in practice, the information the AI gets won't be neatly factored into independent buckets. If we make any of the information up[^z4cjp1urg5], we risk creating inconsistencies, which the AI might notice.[^x7gbryzq27f] At best, this means that the cost of creating tests would grow linearly with the amount of information we need to make up. However, it seems likely that the difficulty grows faster (e.g., with the number of interactions between facts). As a result, the lower bound from (1) might be higher than it might naively seem. ### (3) Actual entanglement is higher than minimum entanglement. In practice, the difficulty of creating effective tests will depend on the *actual* entanglement. If we decide that making AIs easier to test is important, we could start deliberately reducing their exposure to unnecessary information during deployment, in order to make them easier to test. However, in practice, we have not yet started with this mission at all, much less pushed it to the theoretical limit of minimum entanglement (which likely wouldn't be cost-effective anyway). As a result, we should think of the cost of test-creation as being lower-bounded by actual entanglement, which is much higher than the minimum. ### (4) Conjecture: Many high impact tasks have high entanglement. Finally, there are many reasonably low-entanglement tasks (vacation recommendations?) or even some tasks that are both low-entanglement and important (Riemann hypothesis?). However, we suspect that most important tasks will have high entanglement. For example, accelerating AI R&D within an AI company, curing cancer, or making geopolitical decisions. In other words: Yes, there will be many tasks where we could afford to create perfectly realistic tests that would fool even a future ASI. But these are probably not the tasks where those tests are needed the most. [^av68hvdyszl]: I don't think that any of the existing concepts — e.g., in information theory, computational complexity, or some other field — give a perfect operationalization of the concept we are looking for here. However, I might easily be wrong about this; if so, please let me know! Also, admittedly, the word is not great. Partly, this is because of the connotation with quantum physics. Partly, there are several closely related things where the name would also fit — for example, the fact that facts about the world — including the ones the AI observes — are not independent (and so they are "entangled together"). There is also the issue that "entanglement" suggests the relationship is symmetric, while the connection between an AI and the environment is assymetric — the AI receives information and affects the world in response, but the amount of information going in each direction won't be the same. [^engw87nqonl]: I'm not clear on how much this active-sensing part matters or how to formalize it, but it seems like it might be relevant. An AI that can choose what to look at might need to be treated differently from one that passively receives information. [^bdii3xx5aq7]: Reading this, you might be thinking about Shannon entropy and mutual information, and these may well be relevant concepts. But at this point, I'm not clear on what the best operationalization is — whether it should be a single number, some more multi-dimensional thing, or something else entirely. I want to avoid getting anchored by treating, say, mutual information as if it's definitely the right notion here. [^2743goi1735]: This isn't a perfect solution, but it's pointing in a hopefully reasonable direction. The general issue that needs solving is that you can always trade entanglement for computation. [^lxb2mph0p8d]: In a sense, this is a version of the oft-repeated point that you cannot "just" replace an agent with a planning oracle, because most complicated tasks cannot be "one-shotted" by specifying a "non-agentic program". Rather, those plans will involve the need to reconsider and revise the plan in the course of its execution, as well as reasoned adaptation/interpretation of the plan according to the specific encountered circumstances. The greater the number of such points of intervention, the more unviable it is to execute the plan in a "non-agentic" way. (Mateusz: The best existing reference on this topic that I'm familiar with is this comment by Nate Soares). [^0dbcnsx2qoa]: This connects to a broader issue in AI safety: People sometimes propose to decompose a task into smaller pieces with the rationale that the individual tasks are easier, and so we can get away with using a weaker AI that is less dangerous. Or we might hope that the individual tasks will be easier to monitor. Or the decomposition might allow us to give each individual AI a smaller amount of context, which would make scheming and evaluation gaming harder.A crucial worry is that this framing obscures the fact that finding the right decomposition and assembling the solutions of the sub-tasks into a solution of the original task are often the hardest parts of the problem. So the decomposition might not have given us that much after all. In the language of entanglement: If we rely on decomposition, the task of finding the decomposition and assembling the final solution might have the same minimum entanglement as the original task. [^iz0tyflkvn]: By uncertainty, I don't necessarily mean a probability distribution. It might be Knightian uncertainty, where you just know the task is one of several possibilities, but you don't have a prior over them. [^yaupwmvf8s]: I'm not fully confident in these particular examples — the self-driving car might handle some wrong inputs gracefully, and "approximately correct" taxes might be good enough for some purposes. What I'm pointing at is the distinction between these two types of degradation, which I think is real (even if my specific examples aren't perfect). [^giumlplalc]: Query complexity gives tools for proving lower bounds on how much interaction a solver needs. Distortion measures worst-case welfare loss from using rankings instead of utilities. In other words, it asks "how much performance are we losing by using a less detailed task description". For a reference, check Anshelevich et al., "Distortion in Social Choice Problems: The First 15 Years and Beyond," IJCAI 2021. [^z4cjp1urg5]: Crucially, if we want to test hypothetical situations, we do have to make things up; or else the situation would be real. The best case here is what I would "on-policy testing": Me testing what an AI would do when used for writing LessWrong posts and grant applications (which I already do and use AI for), or an AI company testing what their AI would do when helping with AI development (which they are already doing). As we argued in a previous post, this still requires making up some things. But much less if each of us ran tests on something we have no connection to.However, note that this "on-policy testing" approach doesn't let us test the situations that matter the most, such as behaviour during genuine takeover opportunities. [^x7gbryzq27f]: I sometimes encounter the proposal to introduce deliberate inconsistencies during deployment, in order to prevent the presence of inconsistencies to serve as a reliable signal of being in evaluation. I mostly just think this is a bit silly, and won't work, for various reasons. However, this post offers a nice way to articulate one of the objections: Yes, you can introduce inconsistencies in the information that is mostly irrelevant to the task. However, you can't do this with the critical information, because if you did, you would no longer get an acceptable solution. As a result, the AI can mentally assign much higher importance to inconsistencies in the crucial data — those are the ones that point to an imperfectly created test environment. [^b24cgwrkytk]: If This Were a Test, How Much Would It Cost? develops the testing implications in detail — both the concrete argument and potential countermeasures. This post focuses on the underlying concept. [^izg9y9gbi1]: One might object: if nobody has all of the relevant information, wouldn't the AI also have a hard time telling a fake scenario from a real one? Probably not — much like it is much easier to spot plot holes in stories than it is to write stories without plot holes. You don't need to know all the details to notice that the details you do have are inconsistent with each other. [^3pmbk6wse5a]: Depending on how we operationalise what counts as "an important problem", some famous problems, like the Riemann hypothesis, might qualify as a counterexample to our claim here. Our impression is that these "counterexample problems" are often self-contained and might not have that big of an impact on the world.That said, maybe there are some problems which are both "low entanglement" and high-impact (or even just "likely to make the person who solves them extremely rich"). We couldn't think of any, but if you can, we would be curious to hear about them! * * * ## Comment Threads 12 open threads. To reply: POST /api/agent/replyToComment { postId, key, threadId, comment } ### Thread `gtivg` · comment > That said, maybe there are some problems which are both "low entanglement" and high-impact (or even… **Mateusz Bagiński** (2026-06-23, 12:59): Bitcoin? **Mateusz Bagiński** (2026-06-23, 13:01): Also, depending on how you further specialize various things, Grigori Perelman didn't need much entanglement to solve the Poincaré Conjecture and would have gotten quite rich, had he accepted the prize. **Mateusz Bagiński** (2026-06-23, 13:03): (re bitcoin) More generally, figuring out a piece of math that the world unknowingly needs for making something more efficient. Within econ, the whole cluster of ideas around Georgism/Harberger taxes/Freiwirtschaft is also one such thing (probably), but the inadequate equilibrium we're in makes gradual test-adoption of them prohibitively difficult. **VojtaKovarik** (2026-06-25, 13:54): Eh. 1M USD isn't "extremely rich" in my book :-). **VojtaKovarik** (2026-06-25, 13:56): But to the more general point: I don't find any of these particularly convincing. I somewhat doubt that inventing the math behind bitcoin (or whatever else) would, *by itself*, make you rich. ### Thread `ckfof` · comment > Why does this conjecture matter? **Mateusz Bagiński** (2026-06-23, 13:08): the whole premise of this seems to be highly resonant with many things discussed in https://www.lesswrong.com/s/n3utvGrgC2SGi9xQX (to the extent I remember it well, as I read it long time ago) **VojtaKovarik** (2026-06-25, 14:05): Hm, I don't see the connection :-( . But likewise, it has been a long time since I have read that sequence. And also I think I only picked particular things from it. **Mateusz Bagiński** (2026-06-30, 13:02): I think I had in mind that the sequence/paper was repeatedly saying something like: You cannot get a very good predictor to predict what's going to happen, given XYZ, unless the way you supply evidence/information abt XYZ to it looks "credible"/"realistic"/"natural". Moreover, if you supply this evidence/information in a way that is not "credible"/"realistic"/"natural", then it will start believing it's being evaluated or even in some other, very convoluted scenario, and the outputs will not be very informative to you at all. **VojtaKovarik** (2026-06-30, 21:26): Right, that I would endorse. If you could remember some more specific bit of it that we could cite, that would make sense to acknowledge as a reference. ### Thread `vsqkr` · comment > Now, very often you might be able to solve the exact same task with much less entanglement. **Mateusz Bagiński** (2026-06-23, 13:16): https://www.lesswrong.com/posts/FuGfR3jL3sw6r8kB4/richard-ngo-s-shortform#mCbi8qQjo4fye8frw **VojtaKovarik** (2026-06-25, 14:00): Interesting thought, thanks for sharing? Relatedly, was that meant for my curiosity, or as a suggestion to add it as a link? **Mateusz Bagiński** (2026-06-30, 13:23): Just a curiosum, but I think there might be some fruitful ideas there. ### Thread `dydxp` · comment > If true, this > would imply that creating believable evaluations for important tasks will be expensi… **Mateusz Bagiński** (2026-06-30, 16:43): maybe reframe this to: if you want to solve an important task with an AI, you typically need to supply the AI a lot of information about the task, and it is very hard to do this without also providing it a lot of information about the environment. If true, this would imply that creating believable evaluations for important tasks will be expensive — maybe prohibitively so — because the AI will notice that the information provided about the environment (if not about the task) looks "artificial"/fabricated. **Mateusz Bagiński** (2026-06-30, 16:46): If the general reframing I suggested sounds good, it probably makes sense to just put more text at the beginning of the post **VojtaKovarik** (2026-06-30, 21:24): - The reframing sounds reasonable. (Any longer than that, and it would stop being okay length for tl;dr, imo.) -The "if not about the task" bit is unclear. - unclear about how to formulate the artificial/fabricated bit. An important part here is that the AI might notice that it is internally inconsistent. Which I guess you could say falls under noticing it is artificial. **VojtaKovarik** (2026-06-30, 21:25): - not sure what exactly you mean by putting text at the beginning of the post. (Whether it means extending the tl;dr or putting more text below the italic text and above "why does this matter" or something else.) **Mateusz Bagiński** (2026-07-01, 12:54): I mean adding one more section that is something like an extended TL;DR that explains/reframes conjecture in the way I suggested. **Mateusz Bagiński** (2026-07-01, 13:24): Ok, actually slightly extending the tldr should be fine **Mateusz Bagiński** (2026-07-01, 13:25): re: "unclear about how to formulate the artificial/fabricated bit" I am more worried about the "*important* task" bit being unclear ### Thread `yhuvg` · comment > you could design a solver with zero entanglement **Mateusz Bagiński** (2026-07-01, 14:24): The entanglement would be moved to the solver though **Mateusz Bagiński** (2026-07-01, 14:26): Similarly to how, when you have two Turing machines T and T* such that for any input x, T*(x)=T(p), where p is some program, there is a sense in which T* has p compressed in it, and the complexity of p (and/or its output) now sits in T*. **VojtaKovarik** (2026-07-05, 10:44): I basically meant a rock that has the correct answer written on it. I didn't think about the Turing machine example in detail, but my sense is that this is a counterexample. Or at least the rock has like zero entanglement. Obviously, what actually happens in this exact example is that *I* have all the entanglement, since I solved the task to be able to write the solution on the rock. Though you could posit the random existence of the rock with the right solution on it, which would break that thing also. ### Thread `naaqj` · comment > loose **Mateusz Bagiński** (2026-07-01, 14:37): not sure about this but saying "loose collection" might appear self-deprecating or something **VojtaKovarik** (2026-07-02, 16:48): The intent was to signal that the "sequence" isn't a traditional sequence in that there would be a big advantage in reading it in order. The key distinction is "collection of posts" vs "a structured, linear document". ### Thread `xfntx` · comment > TL;DR: PLACEHOLDER. **Mateusz Bagiński** (2026-07-03, 11:09): TODO once we have the rest ### Thread `dlcqe` · comment > [1] **Mateusz Bagiński** (2026-07-03, 11:43): @Vojta move this footnote earlier? ### Thread `duyni` · comment > the complexity of a class of problems **Mateusz Bagiński** (2026-07-03, 11:56): I'm a bit of a computational complexity noob, but doesn't the "complexity of a class of problems" approach also admit a degenerate solution in the form of a lookup table, where every problem is mapped to its solution? (I guess this doesn't work if the class of problems is infinite, but this isn't said here explicitly.) **VojtaKovarik** (2026-07-05, 10:46): Eh. I guess that is right. Not sure how to solve that. I imagine leaving it as it is is better than deleting it or adding too much extra text. If you can think of a small amount of text that would fix it, go ahead :). ### Thread `tuvnq` · comment > I conjecture that there are tasks where performance degrades gracefully as you give the solver less… **Mateusz Bagiński** (2026-07-03, 11:59): This conjecture admits a dual: On some tasks, you can notice sharp/non-linear/discontinuous rises in performance (whatever that means) as you cross some level of information provided to the AI. **VojtaKovarik** (2026-07-05, 10:39): +1 ### Thread `phiyl` · comment > It is worth noting that in practice, the actual entanglement of AI systems has been increasing with… **Mateusz Bagiński** (2026-07-03, 12:02): On the one hand, yeah. On the other hand, the minimum entanglement has been decreasing, because smarter AIs can do with less info, because they have better biases/priors, more "think-oomph", etc. Not sure how to assess the size of this effect, but at the very least, you need to prompt the model less and less for most tasks to get it to do what you want. This suggests that the type signature of minimum entanglement might be more complicated. **VojtaKovarik** (2026-07-05, 10:32): I agree with that observation, and the conclusion that the type-signature is tricky. One reaction is something like "well, the entanglement is still there. It's just that we already paid it during training". But that has implications. For example, suppose I wanted to ask for vacation recommendations. If I had an ASI that knows nothing about our world, I would first need to describe our world and my preferences. When I have an LLM today, I need to describe my preferenes. Some future AI that knows much more about our world might be fine if we just ask it "for vacation recommendations for Vojtech Kovarik, the one who works at CTU", and it will already know everything else. ### Thread `nwgxk` · comment > Connections to AI evaluation **VojtaKovarik** (2026-07-05, 10:23): TODO * * * ### Navigation * [Front page](https://www.lesswrong.com/api/home) * [Markdown API documentation](https://www.lesswrong.com/api/SKILL.md)