You might are probably right. For someone arguing the benefits of AI I certainly can't accuse this writer of being misleadingly optimistic.
But personally I've recently found it quite disconcerting how bleak the image of the future of people who work in AI (on both sides of the capabilities/safety divide) seem to be willingly to work towards building.
Overcoming this kind of reflexive defeatism seems to me much harder than simply trying to convince people that we are going in a bad direction as a matter of fact.
You're absolutely right to focus on the moment the model fails. Updating your model to account for its failures is effectively what learning is. Again if we look at you from the outside we can give an account of the form: The model failed because it did not correspond to reality, so the agent updated it to one which corresponded better to reality (AKA was more true).
But again from the inside there is no access to reality, only the model. Perception and prediction and both mediated by the model itself, and when they contradict each other the model must be adjusted. But that the perceptions come from the 'real' external world itself just a feature of the model.
You have the extraordinary ability to change your own model in response to its contradictions. Lets consider the case of agents that can't do that.
If a roomba is flipped on its back and its wheels keep spinning (I imagine in real life roombas probably have some kind of sensor to deal with these situations but lets assume this one doesn't), from the outside we can say that the roomba's model, which says that spinning your wheels makes you move, is no longer in correspondence with reality. But from the point of view of the roomba, all that can be said is that the world has become incomprehensible.
On the other hand, there's another concern I've been wary of in the context of AI safety startups (which is what I'm currently exploring) and research in general: following the short-term success gradient. In startups, you can start with a noble vision and then become increasingly pressured away from the initial vision simply because you are pursuing the customer gradient and "building what people want." If your goal is large-scale (venture) success, then it only makes sense. You need customers and traction for your Series A after all. Even in research, there's only so much fucking around you can do until people want something legible from you.
This is my biggest concern with d/acc style techno-optimism, it seems to assume that genuinely defensive technologies can compete economically with offensive ones (all it takes is the right founders, seed funding etc.).
Whereas my impression is that any kind of ethical/ideological commitment immediately puts a startup at a massive structural disadvantage against those who chose simply to give the market what it wants (acceleration).
One additional consideration wrt to the notebook:
Unlike you I am still very ambivalent about note taking.
I got through most of my education relying on my (very much imperfect) memory, forgetting lots but generally remembering enough to get by, and always felt that for example taking notes during a lecture was too distracting from actually listening.
Then at some point a couple of years ago I got fed up about having to relearn the same things multiple times and started using Obsidian to try and systematically take notes on everything I read.
But recently I have been feeling that the transfer from mental representations to text is far too lossy, and text remains static while remembered information can be morphed and readapted dynamically with new information and new contexts. Whats worse using the notebook really does externalise memory in the sense that once I convert my ideas into text my mind seems to let go of the richer mental representations and either retain nothing or just the compressed textual ones.
So using the notebook feels a bit like I am deferring agency to another OIS that has much better memory (storage) than be, but is also probably stupider.
(will probably try to respond to some of the rest later)
Bob gives good advice
Bob gives bad advice
Bob is a skilled manipulator and deliberately says things that will make Alice do...
what is in his interest.
what he thinks is in her interest.
what his values say she should do.
what he thinks her values say she should do.
Bob wants and advises Alice to do what he thinks she should do (based on his own values).
Bob is highly convincing and Alice does what he suggests.
They have the same values
They have different values
Alice is not convinced by Bob responding to his advice helps her clarify what she thinks she should do.
Bob's advice changes Alice's values
Bob tries to figure out Alice's values and then advises her based on that.
He gets it wrong.
He gets it right...
because he knows her well and asks lots of relevant questions.
by pure luck.
Bob believes that only she knows her own values so he...
tells her he cannot help her.
tells her he cannot give advice, but he can tell her a some facts he knows that may help her make the decision for herself.
Equipped with this new information, Alice is able to make a decision that better reflects her own values.
Bob carefully selects facts that push her towards a specific choice, while censoring ones that won't.
Bob tells her everything he knows but for contingent reasons of selection (such as what kind of facts Bob is interested in) these only include facts that push her towards a specific choice, and exclude that won't.
The new knowledge contradict some of Alice's pre-existing beliefs about the problem...
and she can now make a better informed decision.
and she is now even more confused about what to do than before.
Bob is an omniscient god and tells Alice every fact about the universe.
Equipped with this new information, Alice is able to make a decision that better reflects her own values.
Equipped with this new information, Alice realises she holds contradictory values that point to different courses of action.
Now she has ascended to omniscience Alice no longer cares about the problem.
Bob tells Alice to ask Charlie
Bob tells Alice to ask ChatGPT
Bob asks ChatGPT and then passes the response off as his own
Bob is a rubber duck and says nothing
I think that when seen from outside of the agent, your account is correct. But from the perspective of the agent, the world and the world model are indistinguishable, so the relationship between prediction and time is more complex.
I don't think thermostat consciousness would require homunculi any more than human consciousness does but I think it was a mistake on my part to use the word consciousness as it inevitably complicates things rather than simplifying them (although FWIW I do agree that consciousness exists and is not an epiphenomenon).
For the thermostat (assuming the bimetallic strip type), the reference is the position of a pair of contacts either side of the strip, the temperature causes the curvature of the strip, which makes or breaks the contacts, which turns the heating on or off. This is all physically well understood. There is nothing problematic here.
For me acting as the thermostat, I perceive the delta, and act accordingly. I don't see anything problematic here either. The sage is not above causation, nor subject to causation, but one with causation. As are we all, whether we are sages or not.
The thermostat too is one with causation. The thermostat acts in exactly the same way as you do. I is possibly even already conscious (I had completely forgotten this was an established debate and its absolutely not a crux for me). You are much more complex that a thermostat.
I think there is something a bit misleading about your example of a person regulating temperature in their house manually. The fact that you can consciously implement the control algorithm does not tell us anything about your cognition or even your decision making process since you can also implement pretty much any other algorithm (you are more or less turing complete subject to finiteness etc.). PCT is a theory of cognition, not simply of decision making.
I like this ontology.
Although I wonder if having such a general definition that applies to so many and so many different kinds of things causes it to start losing meaning, or at least demands some further subdividing.
Also it seems like maybe there is a point at which a sharp line cannot be drawn between two OISs that overlap too much. E.g. While I am willing to recognise that the me OIS and the me + notebook and pen OIS are in some sense meaningfully distinct, it seems like they have some very strong relation, possibly some hierarchy, and that the second may not be worth recognising as distinct in practice.
You are right that I am being a bit reductive. Maybe it would be better to say it assumes some kind of ideal combination of innovation, markets and technocratic governance would be enough to prevent catastrophe?
And to be clear I do think its much better for people to be working on defensive technologies, than not to. And its not impossible that the right combination of defensive entrepreneurs and technocratic government incentives could genuinely solve a problem.
But I think this kind of faith in business as usual but a bit better can lead to a kind of complacency where you conflate working on good things with actually making a difference.