Philosopher working in Melbourne, Australia. My book Meaning and Metaphysical Necessity is forthcoming in June 2022 with Routledge.
Would be good to see some more references and discussion of illusionism as a view in its own right. For my money the recent work of Wolfgang Schwarz on imaginary foundations and sensor variables gives a powerful explanation of why we might have this illusion.
I'd be interested to hear how this compares with Wolfgang Schwarz's ideas in 'Imaginary Foundations' and 'From Sensor Variables to Phenomenal Facts'. Sounds like there's some overlap, and Schwarz has a kind of explanation for why the hard problem might arise that you might be able to draw on.
Link to the second of the papers mentioned: https://www.umsu.de/papers/sensorfacts.pdf
Very interesting. I'm stuck on the argument about truthfulness being hard because the concept of truth is somehow fraught or too complicated. I'm envisaging an objection based on the T-schema ('<p> is true iff p').
Now, in real life, building a truthful AGI is much harder than building a diamond optimizer, because 'truth' is a concept that's much more fraught than 'diamond'. (To see this, observe that the definition of "truth" routes through tricky concepts like "ways the AI communicated with the operators" and "the mental state of the operators", and involves grappling with tricky questions like "what ways of translating the AI's foreign concepts into human concepts count as manipulative?" and "what can be honestly elided?", and so on, whereas diamond is just carbon atoms bound covalently in tetrahedral lattices.)
(end of quote)
But this reference to "the definition of 'truth'" seems to presuppose some kind of view, where I'm not sure what that view is, but know it's definitely going to be philosophically controversial.
Some think that 'true' can be defined by taking all the instances of the T-schema, or a (perhaps restricted) universal generalisation of it.
And this seems not totally crazy or irrelevant from an AI design perspective, at least at first blush. I feel I can sort of imagine an AI obeying a rule which says to assert <p> only if p.
Trying to envisage problems and responses, I hit the idea that the AI would have degrees of belief or credences, and not simply a list of things it thinks are true simpliciter. But perhaps it can have both. And perhaps obeying the T-schema based truthfulness rule would just lead it to confine most of its statements to statements about its own credences or something like that.
I think I see a separate problem about ensuring the AI does not (modify itself in order to) violate the T-schema based truthfulness rule. But that seems different at least from the supposed problem in the OP about the definition of 'true' being fraught or complicated or something.
If it wasn't already clear I'm a philosophy person, not an alignment expert, but I follow alignment with some interest.
This is an instance of arc that clever people have been going through for ages, so I'd like to see more teasing apart of the broader phenomenon from the particular historical episode of the Sequences etc.
A lot of the mixed feelings and lack of identification as rationalists on the part of lots of people who found the Sequences interesting reading is to be explained in terms of their perceiving the vibe you describe and being aware of its pitfalls.
Interesting to read, here are a couple of comments on parts of what you say:
>the claim that all possibilities exist (ie. that counterfactuals are ontologically real)
'counterfactuals are ontologically real' seems like a bad way of re-expressing 'all possibilities exist'. Counterfactuals themselves are sentences or propositions, and even people who think there's e.g. no fact of the matter with many counterfactuals should agree that they themselves are real.
Secondly, most philosophers who would be comfortable with talking seriously about possibilities or possible worlds as real things would not go along with Lewis in holding them to be concrete. The view that possibilities really truly exist is quite mainstream and doesn't commit you to modal realism.
>what worlds should we conceive of as being possible? Again, we can make this concrete by asking what would >happen if we were to choose a crazy set of possible worlds - say a world just like this one and then a world with >unicorns and fountains of gold - and no other worlds
I think it's crucial to note that it's not the presence of the unicorns world that makes trouble here, it's the absence of all the other ones here. So what you're gesturing at here is I think the need for a kind of plenitude in the possibilities one believes in.
Ramsey could be on the list too but I guess his tragically short life makes it hard to do some of the cells.
Maybe a bit off-colour to call the fact that three of Wittgenstein's brothers committed suicide 'delicious'...
Wittgenstein had so many ideas and is such a difficult thinker that I think one ought to read him before secondary sources. Also he's a wonderful writer.
I think there's a potentially confusing fact which you're neglecting in this post, namely the reality of literature as territory not map. If you're interested in literature, then when you read it you get lots of knowledge of what e.g. certain books contain, what certain authors wrote, and that can be very instructive not just within literature. I'd like to see you and others with this kind of viewpoint wrestle more with this kind of consideration.
Why is it OK to use deduction theorem, though? In standard modal logics like K and S5 the deduction theorem doesn't hold (otherwise you could assume P, use necessitation to get P, and then use deduction theorem to get P -> P as a theorem).