Detailed instructions to download and use the skill can be found on Github here
I built a Claude skill to comment on docs. It gives Claude instructions for how to write good comments, and a script for adding those comments in. This currently only works with Word docs. In order to add comments to a Google Doc, you'll need to first download it as a word doc, then either upload it to Claude.ai or use Claude (Code or App) locally.[1] Alternatively, you could copy paste your document into Claude.ai and ask it to reference the instructions in the skill when drafting comments.[2]
Yes, that's a bit tedious. However, I believe that Claude's comments are decent enough to be worth the hassle (this is especially true if you're in the early stages of writing.)
Content: The term 'abstract analogies' is appealing but underspecified. What makes an abstract analogy good vs. bad? You could strengthen this by giving criteria — e.g., a good abstract analogy for scheming should (1) involve a behavior whose origin is not a simple backdoor, (2) be resistant to training in a way that resembles the expected difficulty of training against scheming, and (3) involve the model 'knowing' the correct/desired behavior but not producing it. Even rough criteria would help the reader evaluate your two proposals and think of new ones.
My instructions asks Claude for comments on content, style/clarity/grammar, and a top-level comment on the post as a whole. Here's a (lightly cherry picked) example of a style comment on the same piece, around the sentence "An interesting question, therefore, is whether it’s possible to train models to reveal harmful information using methods other than supervised fine-tuning on ground truth harmful information"
Style: The connection to sandbagging could be made tighter here. Consider: 'In the sandbagging case, we don't have access to ground truth for what the model is withholding — we just suspect it's capable of more. Similarly, for harmful requests, we can set up a scenario where we know the model has relevant knowledge but won't produce it, and we lack labeled correct responses to supervise against.' This would make the structural parallel more vivid for the reader.
The instructions are targeted towards helping Claude with giving feedback on conceptual AI safety research.[3] You can also use it for other posts. Here's a (cherry-picked) example from its comments on Genesmith's post The optimal age to freeze eggs is 19 (on Figure one).
Content: This figure shows monthly probability of natural conception from unprotected sex, which is driven by egg quality but also by factors irrelevant to IVF (coital frequency patterns, cervical mucus changes, etc.). Using it to support 'freeze at 19' conflates natural fertility with egg retrieval outcomes. The peak around 19-20 is partly driven by these non-egg-quality factors. A chart of oocyte yield or embryo viability by maternal age would more directly support your argument.
Not all of the comments are amazing. Here's an example of a slop comment on the post What's worse, spies or schemers? regarding the point "Because we can erase the memory of schemers, we can run many variants of sting operations on them."
Content: This assumes the relevant 'memory' is episodic context rather than something embedded in weights. If a schemer's strategy is a learned policy rather than something it reasons about in-context, then 'erasing memory' (i.e. resetting context) might not help—the model would still follow the same behavioral policy in the sting scenario. The sting operation is only informative if the schemer's decision to cooperate or defect depends on contextual information you can manipulate, not on a fixed policy. Worth flagging this limitation.
(This comment doesn't make much sense to me.)
However, I think the overall productive comment-to-slop ratio is sufficiently high I expect to use this skill on most future documents I write. Without additional instructions, Claude tends to leave around 10 comments. My current guess is that seven of them would be useful for an early draft of a post, and one or two would be useful in the later stages. (You can ask for more or less comments.)
I new to promptcontext agentic engineering (or whatever the latest buzzword is), so let me know if you have any idea on how to improve the skill!
Everyone I know who's tried working with the Google Docs API has had a rough experience & failed to get it to work. Let me know if you manage to get it to work though!
Detailed instructions to download and use the skill can be found on Github here
I built a Claude skill to comment on docs. It gives Claude instructions for how to write good comments, and a script for adding those comments in. This currently only works with Word docs. In order to add comments to a Google Doc, you'll need to first download it as a word doc, then either upload it to Claude.ai or use Claude (Code or App) locally.[1] Alternatively, you could copy paste your document into Claude.ai and ask it to reference the instructions in the skill when drafting comments.[2]
Yes, that's a bit tedious. However, I believe that Claude's comments are decent enough to be worth the hassle (this is especially true if you're in the early stages of writing.)
Here is a (lightly cherry-picked) example comment it left on Julian Statsny's post Two proposed projects on abstract analogies for scheming:
My instructions asks Claude for comments on content, style/clarity/grammar, and a top-level comment on the post as a whole. Here's a (lightly cherry picked) example of a style comment on the same piece, around the sentence "An interesting question, therefore, is whether it’s possible to train models to reveal harmful information using methods other than supervised fine-tuning on ground truth harmful information"
The instructions are targeted towards helping Claude with giving feedback on conceptual AI safety research.[3] You can also use it for other posts. Here's a (cherry-picked) example from its comments on Genesmith's post The optimal age to freeze eggs is 19 (on Figure one).
Not all of the comments are amazing. Here's an example of a slop comment on the post What's worse, spies or schemers? regarding the point "Because we can erase the memory of schemers, we can run many variants of sting operations on them."
(This comment doesn't make much sense to me.)
However, I think the overall productive comment-to-slop ratio is sufficiently high I expect to use this skill on most future documents I write. Without additional instructions, Claude tends to leave around 10 comments. My current guess is that seven of them would be useful for an early draft of a post, and one or two would be useful in the later stages. (You can ask for more or less comments.)
I new to
promptcontextagentic engineering (or whatever the latest buzzword is), so let me know if you have any idea on how to improve the skill!Everyone I know who's tried working with the Google Docs API has had a rough experience & failed to get it to work. Let me know if you manage to get it to work though!
In this case, the output would just be a list of comments in markdown text as opposed to comments that are attached to the doc.
Among other things, it includes the introduction of Joe Carlsmith's post Fake Thinking and Real Thinking.