LESSWRONG
LW

673
lennie
21320
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Sandbagging
LessWrong FAQ
lennie14d10

Cool! Thanks Ruby!
Yes, would be keen to play around with that! 
What sort of tech stack are you using for this?

Your idea of Claude-integration was super interesting - and got me thinking about how best to let arbitrary LLMs interface with LW/AF content.
So I asked Claude about it - see this chat.
Claude suggested that building a custom MCP server might be 'straightforward' - and would allow anyone using the Claude API to immediately use the MCP server to access a structured form of LW/AF output.
It would be ideal to have this as a default/optional feature of the Claude web-interface, but that would require Anthropic's buy-in.
How excited are you about these directions?

Re making exporting easy: I think a 'paste to markdown' button would still/also be helpful - and that I'd use this a lot if available, even without LLM integrations. Do you think others would also be interested? / Is anyone else also interested?

 

Reply
LessWrong FAQ
lennie23d10

I'm wondering about best practices for pasting content from LW into LLM context.

I would like to have a QA with Claude about a set of posts, and seek a good text representation to paste into context. Web search APIs still seem a bit hit and miss, and think a custom solution could have value. Something like a 'paste to markdown' button would be ideal.

I think this code from 2022 might be a good first pass, which I found linked from this comment thread.

Does this sound like a good idea? 
If so, might Lightcone Infrastructure be interested in creating such a feature?

Reply
1Which differences between sandbagging evaluations and sandbagging safety research are important for control?
1d
0
1Sandbagging: distinguishing detection of underperformance from incrimination, and the implications for downstream interventions.
2d
0
22Feedback request: Is the time right for an AI Safety stack exchange?
Q
12d
Q
0