Seconding this observation.
As a practical note, I find that it's useful to, in addition to what you mention about telling the LLM to avoid certain patterns, additionally tell it to go over its changes and rewrite e.g. code which returns a placeholder value in a failure case with code which throws a descriptive error in a failure case. Seems like there are many cases where the LLM isn't able to one-shot follow all of the instructions you give it, but is able to recognize that failure and course-correct after the fact.
Modern large language models go through a battery of reinforcement learning where they are trained not to produce code that fails in specific, easily detectable ways, like crashing the program or causing failed unit tests. Almost universally, this means these models have learned to produce code that looks like this:
function getSomeParticularNumber(connection: SqlConnection): number {
try {
x = connection.query(`SELECT "no" FROM accounts`);
return x[0];
} except (err) {
logger.warn("Couldn't get the number; returning 0 by default so as not to break things")
return 0;
}
}
The above code is very cheeky, and quite bad. It's more likely to pass integration tests than it would be without the try/catch block, but only because it fails silently. Callers of getSomeParticularNumber
won't know whether 0 is what’s actually in the database or if there was an intermittent connection error - which could be catastrophic if the number is the price of an item in a store, say. And if it turns out this code contains a bug (for example, if the table should be "Accounts"
instead of accounts
), testers might not notice that until it's actually impacting users.
Some common ways I've seen this behavior manifest include:
hasattr
or getattr
in untyped languagesUntil reinforcement learning environments get much, much better, these models will probably continue to do this. To help prevent this kind of behavior, I include custom instructions for most of my terminal coders:
When key assumptions that your code relies upon to work appear to be broken, fail early and visibly, rather than attempting to patch things up. In particular:
* Lean towards propagating errors up to callers, instead of silently "warning" about them inside of try/catch blocks.
* If you are fairly certain data should always exist, assume it does, rather than producing code with unnecessary guardrails or existence checks (esp. if such checks might mislead other programmers)
* Avoid the use of hasattr/getattr (or non-python equivalents) when accessing attributes and fields that should always exist.
* Never produce invalid 'defaults' as a result of errors, either for users, or downstream callers.