All of David Johnston's Comments + Replies

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

I find lying easy and natural unless I'm not sure if I can get away with it, and I think I'm more honest than the median person!

(Not a lie)

(Honestly)

Where I agree and disagree with Eliezer

I agree with this. If the key idea is, for example, optimising imitators generalise better than imitations of optimisers, or for a second example that they pursue simpler goals, it seems to me that it'd be better just to draw distinctions based on generalisation or goal simplicity and not on optimising imitators/imitations of optimisers.

Limits to Legibility

A person new to AI safety evaluating their arguments is roughly at a similar position to a Go novice trying to make sense of two Go grandmasters disagreeing about a board

I don't think the analogy is great, because Go grandmasters have actually played, lost and (critically) won a great many games of Go. This has two implications: first, I can easily check their claims of expertise. Second, they have had many chances to improve their gut level understanding of how to play the game of Go well, and this kind of thing seems to be to necessary to develop expe... (read more)

2Jan_Kulveit1mo
In case of AI safety, the analogy maps through things like past research results, or general abilities to reason and make arguments. You can check the claim that e.g. Eliezer historically made many good non-trivial arguments about AI, where he was the first person, or one of the first people to make them. While the checking part is less easy than in chess, I would say it's roughly comparable to high level math, or good philosophy.
Where I agree and disagree with Eliezer

10. AI systems will ultimately be wildly superhuman, and there probably won’t be strong technological hurdles right around human level. Extrapolating the rate of existing AI progress suggests you don’t get too much time between weak AI systems and very strong AI systems, and AI contributions could very easily go from being a tiny minority of intellectual work to a large majority over a few years.

 

I think there will be substantial technical hurdles along the lines of getting in-principle highly capable AI systems to reliably do what we want them to, th... (read more)

Where I agree and disagree with Eliezer

I've written a few half-baked alignment takes for Less Wrong, and they seem to have mostly been ignored. I've since decided to either bake things fully, look for another venue, or not bother, and I'm honestly not particularly enthused about the fully bake option. I don't know if anything similar has had any impact on Sam's thinking.

why assume AGIs will optimize for fixed goals?

I'm not sure exactly how important goal-optimisation is. I think AIs are overwhelmingly likely to fail to act as if they were universally optimising for simple goals compared to some counterfactual "perfect optimiser with equivalent capability", but this is failure only matters if the dangerous behaviour is only executed by the perfect optimiser.

They're also very likely to act as if they are optimising for some simple goal X in circumstances Y under side conditions Z (Y and Z may not be simple) - in fact, they already do. This could easily be enough for da... (read more)

AGI Ruin: A List of Lethalities

I think raw intelligence, while important, is not the primary factor that explains why humanity-as-a-species is much more powerful than chimpanzees-as-a-species. Notably, humans were once much less powerful, in our hunter-gatherer days, but over time, through the gradual process of accumulating technology, knowledge, and culture, humans now possess vast productive capacities that far outstrip our ancient powers.

Slightly relatedly, I think it's possible that "causal inference is hard". The idea is: once someone has worked something out, they can share it an... (read more)

3Gerald Monroe2mo
This is what struck me as the least likely to be true from the above AI doom scenario. Is diamondoid nanotechnology possible? Very likely it is or something functionally equivalent. Can a sufficiently advanced superintelligence infer how to build it from scratch solely based on human data? Or will it need a large R&D center with many, many robotic systems that conduct experiments in parallel to extract the information required about our specific details of physics in our actual universe. Not the very slightly incorrect approximations a simulator will give you. The 'huge R&D center so big you can't see the end of it' is somewhat easier to regulate the 'invisible dust the AI assembles with clueless stooges'.
AGI Ruin: A List of Lethalities

As I said (a few times!) in the discussion about orthogonality, indifference about the measure of "agents" that have particular properties seems crazy to me. Having an example of "agents" that behave in a particular way is a enormously different to having an unproven claim that such agents might be mathematically possible.

AGI Ruin: A List of Lethalities

A Go AI that learns to play go via reinforcement learning might not "have a utility function that only cares about winning Go". Using standard utility theory, you could observe its actions and try to rationalise them as if they were maximising some utility function, and the utility function you come up with probably wouldn't be "win every game of Go you start playing" (what you actually come up with will depend, presumably, on algorithmic and training regime details). The reason why the utility function is slippery is that it's fundamentally an adaptation executor, not a utility maxmiser.

AGI Ruin: A List of Lethalities

FWIW self-supervised learning can be surprisingly capable of doing things that we previously only knew how to do with "agentic" designs. From that link: classification is usually done with an objective + an optimization procedure, but GPT-3 just does it.

AGI Ruin: A List of Lethalities

My view is that if Yann continues to be interested in arguing about the issue then there's something to work with, even if he's skeptical, and the real worry is if he's stopped talking to anyone about it (I have no idea personally what his state of mind is right now)

There's probably a tradeoff between AI capability and safety, and we should act like it

Indeed. If the idea of a tradeoff wasn't widely considered plausible I'd have spent more time defending it. I'd say my contribution here is the "and we should act like it" part.

AGI Ruin: A List of Lethalities

For a very hand-wavy sketch of how that might go, consider asking GPT-N to generate 1000s of candidate high-level plans, then rate them by feasibility, then break each plan into steps and re-evaluate, etc

FWIW, I'd call this "weakly agentic" in the sense that you're searching through some options, but the number of options you're looking through is fairly small.

It's plausible that this is enough to get good results and also avoid disasters, but it's actually not obvious to me. The basic reason: if the top 1000 plans are good enough to get superior performan... (read more)

5ESRogs2mo
Separately from whether the plans themselves are safe or dangerous, I think the key question is whether the process that generated the plans is trying to deceive you (so it can break out into the real world or whatever). If it's not trying to deceive you, then it seems like you can just build in various safeguards (like asking, "is this plan safe?", as well as more sophisticated checks), and be okay.
AGI Ruin: A List of Lethalities

Right, but the goal is to make AGI you can point at things, not to make AGI you can point at things using some particular technique.

(Tangentially, I also think the jury is still out on whether humans are bad fitness maximizers, and if we're ultimately particularly good at it - e.g. let's say, barring AGI disaster, we'd eventually colonise the galaxy - that probably means AGI alignment is harder, not easier)

AGI Ruin: A List of Lethalities

Hm, regardless it doesn't really move the needle, so long as people are publishing all of their work. Developing overpowered pattern recognizers is similar to increasing our level of hardware overhang. People will end up using them as components of systems that aren't safe.

I strongly disagree. Gain of function research happens, but it's rare because people know it's not safe. To put it mildly, I think reducing the number of dangerous experiments substantially improves the odds of no disaster happening over any given time frame

AGI Ruin: A List of Lethalities

Do whatever you want, obviously, but I just want to clarify that I did not suggest you avoid personally criticising people (only that you avoid vague/hard to interpret criticism) or saying you think doom is overwhelmingly likely. Some other comments give me a stronger impression than yours that I was asking you in a general sense to be nice, but I'm saying it to you because I figure it mostly matters that you're clear on this.

AGI Ruin: A List of Lethalities

3. The system can think about atoms/physics, and it knows that our world exists, but it still only terminally cares about digital things in the  simulated environment.

Case 3 is not safe, because controlling the physical world is a useful way to control the simulation you're in. (E.g., killing all agents in base reality ensures that they'll never shut down your simulation.)

Not necessarily. Train something multimodally on digital games of Go and on, say, predicting the effects of modifications to its own code on its success at Go. It could be a) good at... (read more)

AGI Ruin: A List of Lethalities

Humans can, to some extent, be pointed to complicated external things. This suggests that using natural selection on biology can get you mesa-optimizers that can be pointed to particular externally specifiable complicated things. Doesn't prove it (or, doesn't prove you can do it again), but you only asked for a suggestion.

7Eliezer Yudkowsky2mo
Humans can be pointed at complicated external things by other humans on their own cognitive level, not by their lower maker of natural selection.
1Rob Bensinger2mo
To my eye, this seems like it mostly establishes 'it's not impossible in principle for an optimizer to have a goal that relates to the physical world'. But we had no reason to doubt this in the first place, and it doesn't give us a way to reliably pick in advance which physical things the optimizer cares about. "It's not impossible" is a given for basically everything in AI, in principle, if you have arbitrary amounts of time and arbitrarily deep understanding.
AGI Ruin: A List of Lethalities

What do you think of a claim like "most of the intelligence comes from the steps where you do most of the optimization"? A corollary of this is that we particularly want to make sure optimization intensive steps of AI creation are safe WRT not producing intelligent programs devoted to killing us.

Example: most of the "intelligence" of language models comes from the supervised learning step. However, it's in-principle plausible that we could design e.g. some really capable general purpose reinforcement learner where the intelligence comes from the reinforcem... (read more)

2ESRogs2mo
This seems probably right to me. I agree that reinforcement learners seem more likely to be agent-y (and therefore scarier) than self-supervised learners.
AGI Ruin: A List of Lethalities

I'm sorry to hear that your health is poor and you feel that this is all on you. Maybe you're right about the likelihood of doom, and even if I knew you were, I'd be sorry that it troubles you this way.

I think you've done an amazing job of building the AI safety field and now, even when the field has a degree of momentum of its own, it does seem to be less focused on doom than it should be, and I think you continuing to push people to focus on doom is valuable.

I don't think its easy to get people to take weird ideas seriously. I've had many experiences whe... (read more)

I vehemently disagree here, based on my personal and generalizable or not history. I will illustrate with the three turning points of my recent life.

First step: I stumbled upon HPMOR, and Eliezer way of looking straight into the irrationality of all our common ways of interacting and thinking was deeply shocking. It made me feel like he was in a sense angrily pointing at me, who worked more like one of the PNJ rather than Harry. I heard him telling me you're dumb and all your ideals of making intelligent decisions, being the gifted kid and being smarter th... (read more)

This kind of post scares away the person who will be the key person in the AI safety field if we define "key person" as the genius main driver behind solving it, not the loudest person.  Which is rather unfortunate, because that person is likely to read this post at some point.

I don't believe this post has any "dignity", whatever weird obscure definition dignity has been given now. It's more like flailing around in death throes while pointing fingers and lauding yourself than it is a solemn battle stance against an oncoming impossible enemy.

For contex... (read more)

I disagree strongly. To me it seems that AI safety has long punched below its weight because its proponents are unwilling to be confrontational, and are too reluctant to put moderate social pressure on people doing the activities which AI safety proponents hold to be very extremely bad. It is not a coincidence that among AI safety proponents, Eliezer is both unusually confrontational and unusually successful.

This isn't specific to AI safety. A lot of people in this community generally believe that arguments which make people feel bad are counterproductive ... (read more)

It seems worth doing a little user research on this to see how it actually affects people. If it is a net positive, then great. If it is a net negative, the question becomes how big of a net negative it is and whether it is worth the extra effort to frame things more nicely.

There's a point here about how fucked things are that I do not know how to convey without saying those things, definitely not briefly or easily.  I've spent, oh, a fair number of years, being politer than this, and less personal than this, and the end result is that people nod along and go on living their lives.

I expect this won't work either, but at some point you start trying different things instead of the things that have already failed.  It's more dignified if you fail in different ways instead of the same way.

2Yitz2mo
Strongly agree with this, said more eloquently than I was able to :)
Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

We could just replace all the labels with random strings, and the model would have the same content

I think this is usually incorrect. The variables come with labels because the data comes with labels, and this is true even in deep learning. Stripping the labels changes the model content: with labels, it's a model of a known data generating process, and without the labels it's a model of an unknown data generating process.

Even with labels, I will grant that many joint probability distributions are extremely hard to understand.

Probability that the President would win election against a random adult citizen?

Say there's 5% chance they're equally or more electable, 2% they're substantially more electable.

That's what you're after, right?

2Daniel Kokotajlo2mo
Yes, exactly. That's why I said "millions."
Probability that the President would win election against a random adult citizen?

If there are 2% of the population more electable "all else equal" to the sitting president (and this is a pretty wild guess), then I think you'd need a pretty good selection procedure to produce candidates who are, on average, better than the current procedure.

2Daniel Kokotajlo2mo
Really? I feel like there are loads of selection procedures that reliably discern much smaller populations than that. For example, I would guess that if you were an elite university and you wanted a selection process such that the people you admit are each probably within the top 0.1% of the nation by academic ability, you can do that. (SAT tests, GPA, etc.) I would also guess that athletes in the Olympics are in the top 0.1% of the world, plausibly top 0.001% or more.
What is the state of Chinese AI research?

Are the unranked Chinese exascale systems relevant for AI research, or is it more that if they've built 2-3 such systems semi-stealthily, they might also be building AI-focused compute capacity too?

What is the state of Chinese AI research?

For what it's worth, I agree that there's clear evidence of ill-will towards the Chinese government (and, you know, I don't like them either). It's reasonable to suspect that this might colour a person's perception of the state of thing that the Chinese government is involved with. It is also superficial, so it's not like I can draw any independent conclusions from it to defray suspicions of bias. I'm also not giving it a lot of weight.

Probability that the President would win election against a random adult citizen?

Not a very principled answer, but: 98%

There is already a substantial preference for incumbents (about 65-35 I think), and I think this would be much stronger if the challenger was completely unaffiliated with politics (I want to say something like 90-10 if challenger and sitting president were equally electable otherwise, maybe 75-25 if the challenger is actually substantially better than the president, 99-1 if they're just average).

Say there's 5% chance they're equally or more electable, 2% they're substantially more electable. Then there's 0.5% on them b... (read more)

2Daniel Kokotajlo2mo
You are the only person who has actually answered the question; thank you. The thing that interests me about this is that it seems like if you are right, both major parties are doing a terrible job of selecting candidates to run against the sitting president. There seem to be millions of people who would win. Maybe they care about more than just winning the election? Or maybe even though there are millions of people who would win, there's no one who has a legibly higher chance of winning than [whatever candidate the GOP ends up selecting?]
An academic journal is just a Twitter feed

I think durability is a really important feature of journal articles. I often read 70 year old articles, and rarely read 70 year old anything else.

I'm not sure what's responsible for the durability, mind you. Long-term accessibility is necessary, obviously, but not sufficient. Academic citation culture is also part of it, I think.

Zenodo is a pretty accepted solution to data-durability in academia (https://zenodo.org). There's no reason you couldn't upload papers there (and indeed they host papers/conference proceedings/etc.). Uploads get assigned a DOI and get versioning, get indexed for citation purposes, etc.

If I were starting a journal it would probably look like "Zenodo for hosting, some AirTable or GitHub workflow for (quick) refereeing/editorial workflow."

3David Hugh-Jones3mo
That is true, but I also think journal editors will internalize that. And it's easy to fetishize this stuff – electronic formats die out, so let's engrave all our journals on stone tablets! – but arguably, any important article exists in 50 versions on the web, and will eventually be preserved, so long as anyone cares about it. At least, that seems to have happened so far.
Visible Homelessness in SF: A Quick Breakdown of Causes

Does San Francisco look like more of an outlier if you plot unsheltered homeless vs house price?

Beware boasting about non-existent forecasting track records

I am a forecaster on that question: the main doubt I had was if/when someone would try to do wordy things + game playing on a "single system". Seemed plausible to me that this particular combination of capabilities never became an exciting area of research, so the date at which an AI can first do these things would then be substantially after this combination of tasks would be achievable with focused effort. Gato was a substantial update because it does exactly these tasks, so I no longer see much reason possibility that the benchmark is achieved only afte... (read more)

Optimization at a Distance

We both have a similar intuition about the kinds of optimizers we're interested in. You say they optimize things that are "far away", I say they affect "big pieces of the environment". One difference is that I think of big as relative to the size of the agent, but something can be "far away" even if the agent is itself quite large, and it seems that agent size doesn't necessarily matter to your scheme because the information lost over a given distance doesn't depend on whether there's a big agent or a small one trying to exert influence over this distance.... (read more)

Is evolutionary influence the mesa objective that we're interested in?

The Legg-Hutter definition of intelligence is counterfactual ("if it had X goal, it would do a good job of achieving it"). It seems to me that the counterfactual definition isn't necessary to capture the idea above. The LH definition also needs a measure over environments (including reward functions), and it's not obvious how closely their proposed measure corresponds to things we're interested in, while influentialness in the world we live in seems to correspond very closely.

The mesa-optimizer paper also stresses (not sure if correctly) that they're not t... (read more)

Dr Fauci as Machiavellian Boddhisattva

I got "masks are like assholes" from the sentence, even before I read the Ben's analysis.

3Eli Tyre4mo
This is a helpful datapoint for me. I've never heard the phrase that is apparently being riffed on.

I think it's very likely that the people who made the ads are deliberately alluding to "opinions are like assholes". And very unlikely that their intention is to say "masks are like assholes". I think what they're trying to do is to deliver a little surprise, a little punchline. You see "X are like opinions", some bit of your brain is expecting a rude criticism, and oh! it turns out they're saying something positive about X and something positive about having opinions. So (they hope) the reader gets a pleasant surprise and is a bit more willing to pay atte... (read more)

A Solution to the Unexpected Hanging Problem

Taking a cue from the wiki article "you will be hanged tomorrow, and you will not be able to derive from this statement whether or not you will be hanged tomorrow"

Seems kind of weird because it is self-contradictory, and yet true.

ELK prize results

This gets good log loss because it's trained in the regime where the human understands what's going on, correct?

4paulfchristiano5mo
Yes; for this bad F, the resulting G is very similar to a human simulator.
ELK prize results

Regarding
"Strategy: train a reporter which isn’t useful for figuring out what the human will believe/Counterexample: deliberately obfuscated human simulator".

If you put the human-interpreter-blinding before the predictor instead of between the predictor and the reporter, then whether or not the blinding produces an obfuscated human simulator, we know the predictor isn't making use of human simulation.

An obfuscated human simulator would still make for a rather bad predictor.

I think this proposal might perform slightly better than a setup where we expand the... (read more)

A one-question Turing test for GPT-3

My 3yo: "a bowl of museli because it's heavy"

Also "an apple, plum and peach and a bit of wax"

Counterfactuals from ensembles of peers

Peers get the same results from the same actions. It's not exactly clear what "same action" or "same result" means -- is "one boxing on the 100th run" the same as "one boxing on the 101st run" or "box 100 with $1m in it" the same as "box 101 with $1m in it"? I think we should think of peers as being defined with respect to a particular choice of variables representing actions and results.

I think the definitions of these things aren't immediately obvious, but it seems like we might be able to figure them out sometimes. Given a decision problem, it seems to ... (read more)

Counterfactuals from ensembles of peers

how to respond to the temptation to shift from utilising actual peers to potential peers which then seems to reraise the specter of circularity.

I think you might be able to say something like "actual peers is why the rule was learned, virtual peers is because the rule was learned".

(Just to be clear: I'm far from convinced that this is an actually good theory of counterfactuals, it's just that it also doesn't seem to be obviously terrible)

Let's suppose I face Newcomb's problem in a yellow shirt and you face it in a red shirt. They ought to be comparable bec

... (read more)
2Chris_Leong7mo
Just to see I'm following correctly, you're a peer if you obtain the same result in the same situation? My point about yellow shirts and red shirts is that it isn't immediately obvious what counts as the same situation. For example, if the problem involved Omega treating you differently by shirt color, then it would seem like we were in different situations. Maybe your response would be, you got a different result so you're not a peer, no need to say it's a different situation. I guess I would then question if someone one-boxing in Newcomb's and someone opening a random box and finding $1 million would be peers just b/c they obtained the same result. I guess completely ignoring the situation would make the class of peers too wide.
Infra-Bayesian physicalism: a formal theory of naturalized induction

Γ=Σ^R, it's a function from programs to what result they output. It can be thought of as a computational universe, for it specifies what all the functions do.

Should this say "elements are function... They can be thought of as...?"

Can you make a similar theory/special case with probability theory, or do you really need infra-bayesianism? If the second, is there a simple explanation of where probability theory fails?

2Vanessa Kosoy7mo
Yes, the phrasing was confusing, I fixed it, thanks. We really need infrabayesianism. On bayesian hypotheses, the bridge transform degenerates: it says that, more or less, all programs are always running. And, the counterfactuals degenerate too, because selecting most policies would produce "Nirvana". The idea is, you must have Knightian uncertainty about the result of a program in order to meaningfully speak about whether the universe is running it. (Roughly speaking, if you ask "is the universe running 2+2?" the answer is always yes.) And, you must have Knightian uncertainty about your own future behavior in order for counterfactuals to be meaningful. It is not surprising that you need infrabayesianism in order to do naturalized induction: if you're thinking of the agent as part of the universe then you are by definition in the nonrealizable setting, since the agent cannot possibly have a full description of something "larger" than itself.
Dutch-Booking CDT: Revised Argument

My confusion was: even "when the agent is acting", I think it would still be appropriate to describe its beliefs according to EDT. However, I was confused by thinking about "...and then offering a bet". As far as I can tell, this is just an unnecessary bit of storytelling set around a two step decision problem, and a CDT agent has to evaluate the prospects of each decision according to CDT.

Dutch-Booking CDT: Revised Argument

This is an old post, but my idea of CDT is that it's a rule for making decisions, not for setting beliefs. Thus the agent never believes in the outcome given by CDT, just that it should choose according to the payoffs it calculates. This is a seemingly weird way to do things, but apart from that is there a reason I should think about CDT as a prescription for forming beliefs while I am acting?

[This comment is no longer endorsed by its author]Reply
1David Johnston7mo
My confusion was: even "when the agent is acting", I think it would still be appropriate to describe its beliefs according to EDT. However, I was confused by thinking about "...and then offering a bet". As far as I can tell, this is just an unnecessary bit of storytelling set around a two step decision problem, and a CDT agent has to evaluate the prospects of each decision according to CDT.
Causality and determinism in social science - An investigation using Pearl's causal ladder

Pearl is distinguishing "intrinsically nondeterministic" from "ordinary" Bayesian networks, and he is saying that we shouldn't mix up the two (though I think it would be easier to avoid this with a clearer explanation of the difference).

Three questions:

  • Do we need determinism to define counterfactuals?

No

  • Is uncertainty represented in causal Bayesian networks typically used in social science limited to "intrinsic nondeterminism"?

No, and so we should be careful not to mix them up with "intrinsically nondeterministic" Bayesian networks

  • Is there no intrinsic nond
... (read more)
1tailcalled7mo
The intrinsic quantum nondeterminism probably mostly gets washed away due to enormous averages. Of course chaos theory means that it eventually gets relevant, but by the time it gets to relevant magnitudes, the standard epistemic uncertainty has already overwhelmed the picture. So I think in comparison to standard epistemic uncertainty, any intrinsic nondeterminism will be negligible in social science. I know and agree. I sort of have some problems with/objections to counterfactuals in the presence of intrinsic nondeterminism. E.g.YX=Xmight not be equal toY(and per chaos theory, would under many cicumstances never be equal or even particularly close). But since intrinsic nondeterminism isn't relevant for social science anyway, I just skipped past them.
Causality and determinism in social science - An investigation using Pearl's causal ladder

Determinism is not a defining feature of counterfactuals, you can make a stochastic theory of counterfactuals that is a strict generalisation of SEM-style deterministic counterfactuals. See Pearl, Causality (2009), p. 220 "counterfactuals with intrinsic nondeterminism" for the basic idea. It's a brief discussion and doesn't really develop the theory but, trust me, such a theory is possible. The basic idea is contained in "the mechanism equations  lose their deterministic character and hence should be made stochastic."

1tailcalled7mo
This doesn't work in social science. Quoting the book: Emphasis added.
$1000 USD prize - Circular Dependency of Counterfactuals

I've previously argued that the concept of counterfactuals can only be understood from within the counterfactual perspective.

I think this goes too far. We can give an account of counterfactuals from assumptions of symmetry. This account is unsatisfactory in many ways - for one thing, it implies that counterfactuals exist much more rarely than we want them to. Nonetheless, it seems to account for some properties of a counterfactual and is able to stand up without counterfactual assumptions to support it. I think it also provides an interesting lens for exam... (read more)

Counterexamples to some ELK proposals

So, granting the assumption of not corrupting the humans (which is maybe what you are denying), doesn't this imply that we can go on adding sensors after the fact until, at some point, the difference between fooling them all and being honest becomes unproblematic?

Counterexamples to some ELK proposals

Do you run into a distinction between benign and malign tampering at any point? For example, if humans can never tell the difference between the tampered and non-tampered result, and their own sanity has not been compromised, it is not obvious to me that the tampered result is worse than the non-tampered result.

It might be easier to avoid compromising human sanity + use hold-out sensors than to solve ELK in general (though maybe not? I haven't thought about it much).

2paulfchristiano8mo
In some sense this is exactly what we want to do, and this is why we are happy with a very "narrow" version of ELK (see the appendices on narrow elicitation and why it might be enough [https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit#heading=h.ii599facmbks] , indirect normativity [https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit#heading=h.3y1okszgtslx] , and avoiding subtle manipulation [https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit#heading=h.3gj06lvtpme7] ). But you still need to care about some sensor tampering. In particular, you need to make sure that there are actually happy humans deliberating about what to do (under local conditions that they believe are conducive to figuring out the answer), rather than merely cameras showing happy humans deliberating about what to do.
Generalizing Koopman-Pitman-Darmois

I'm a bit curious about what job "dimension" is doing here. Given that I can map an arbitrary vector in  to some point in  via a bijective measurable map (https://en.wikipedia.org/wiki/Standard_Borel_space#Kuratowski's_theorem), it would seem that the KPD theorem is false. Is there some other notion of "sufficient statistic complexity" hiding behind the idea of dimensionality, or am I missing something?

3johnswentworth10mo
There's a smoothness assumption. I assumed differentiability, although that could be weakened somewhat. (The assumption is hidden in the sister post to this one, The Additive Summary Equation [https://www.lesswrong.com/posts/E4GvMdELt6s6CaXrb/the-additive-summary-equation] .)