Posts

Sorted by New

Wiki Contributions

Comments

Please let me know if that description is wrong!

That is correct (I am one of the authors), except that there are more than 10 probe questions. 

Therefore, if the language model (or person) isn't the same between steps 1 and 2, then it shouldn't work.

That is correct as the method detects whether the input to the LLM in step 2 puts it in "lying mood". Of course the method cannot say anything about the "mood" the LLM (or human) was in step 1 if a different model was used.