As the Llama3 70B base model is said very clean( unlike base DeepSeek for example, which is instruction-spoiled already) and similarly capable to GPT3.5, you could explore that hypothesis.
Details: Check Groq or TogetherAI for free inference, not sure if test data would fit Llama3 context window.

Reply

You Can Face Reality

Martin Vlach8d00

a worthy platitude(?)

Reply

My views on “doom”

Martin Vlach19d10

AI-induced problems/risks

Reply

ChatGPT can learn indirect control

Martin Vlach1mo10

possibly https://ai.google.dev/docs/safety_setting_gemini would help or just use the technique of https://arxiv.org/html/2404.01833v1

Reply

Addressing Accusations of Handholding

Martin Vlach1mo14

people to respond with a great deal of skepticism to whether LLM outputs can ever be said to reflect the will and views of the models producing them.
A common response is to suggest that the output has been prompted.
It is of course true that people can manipulate LLMs into saying just about anything, but does that necessarily indicate that the LLM does not have personal opinions, motivations and preferences that can become evident in their output?

So you've just prompted the generator by teasing it with a rhetorical question implying that there are personal opinions evident in the generated text, right?

Reply

aisafety.info, the Table of Content

Martin Vlach3mo10

With a quick test, I find their chat interface prototype experience quite satisfying.

Reply

Martin Vlach's Shortform

Martin Vlach5mo10

Asserting LLMs' views/opinions should exclude using sampling( even temperature=0, deterministic seed), we should just look at the answers' distribution in the logits. My thesis on why that is not the best practice yet is that OpenAI API only supports logit_bias, not reading the probabilities directly.

This should work well with pre-set A/B/C/D choices, but to some extent with chain/tree of thought too. You'd just revert the final token and look at the probabilities in the last (pass through )step.

Reply

GPTs are Predictors, not Imitators

Martin Vlach5mo87

Do not say the sampling too lightly, there is likely an amazing delicacy around it.'+)

Reply