really nice comment that I also happen to agree with. As a programmer working with Claude Code and Cursor every day I have yet to see AI systems achieve "engineering taste", which seems far easier than "research taste" as discussed by OPs. In my experience, these systems cannot perform medium-term planning and execution of tasks, even those that are clearly within distribution.
Perhaps the apparent limitations relate to the independent probability of things going wrong when you aren't maintaining a consistent world-model / in-line learning and feedback.
For example, even if 0.90 of your actions are correct, if they all can independently tank your task then your probably of success after 6 actions... (read more)
really nice comment that I also happen to agree with. As a programmer working with Claude Code and Cursor every day I have yet to see AI systems achieve "engineering taste", which seems far easier than "research taste" as discussed by OPs. In my experience, these systems cannot perform medium-term planning and execution of tasks, even those that are clearly within distribution.
Perhaps the apparent limitations relate to the independent probability of things going wrong when you aren't maintaining a consistent world-model / in-line learning and feedback.
For example, even if 0.90 of your actions are correct, if they all can independently tank your task then your probably of success after 6 actions... (read more)