So, Oars, Bribes, one cannon, and then either Axes or another 2 cannons. I'm guessing the cannons will do more, based on speculating the extrapolations of Crab vs Nessie.
There have been quite a few studies that show that people can self-heal in remarkable ways.
Summary article: https://www.medicalnewstoday.com/articles/306437#what-is-the-placebo-effect
Summary of the summary: With three treatment groups, the outcomes for the Active group exceed those taking a sugar pill, which, in turn, exceed those not getting anything. Some studies have even shown that bringing people in, telling them that you are giving them a sugar pill with no medicine, and comparing them to a second group that doesn't get the completely non active pill, the group that is not getting any medicine, but is doing something medicine-adjacent (swallowing a pill) has p<.05 significantly better outcomes.
This can even be entirely honest. Even if everybody really does have common knowledge that X is a lie, they probably don't agree on what the actual truth is, and acting as if the known lie X is true can be a compromise position to get stuff done, as long as X isn't too far away from the truth.
I object to the characterization that it is "getting the right idea." It seems to have latched on to "given a foo of bar" -> "@given(foo.bar)" and that "assert" should be used, but the rest is word salad, not code.
Generally the answers aren't multiple choice. Here's a couple examples of questions from a 5th grade science textbook I found on Google:
How would you state your address in space. Explain your answer.
Would you weigh the same on the sun as you do on Earth. Explain your answer.
Why is it so difficult to design a real-scale model of the solar system?
That wouldn't be useful, though.
My assertion is more like:
After getting the content of elementary school science textbooks (or high school physics, or whatever other school science content makes sense), but not including the end-of-chapter questions (and especially not the answers), GPT-4 will be unable to provide the correct answer to more then 50% of the questions from the end of the chapters, constrained by having to take the first response that looks like a solution as it's "answer" and not throwing away more than 3 obviously gibberish or bullshit responses per question.
And that 50% number is based on giving it every question without discrimination. If we only count the synthesis questions (as opposed to the memory/definition questions), I predict 1%, but would bet on < 10%
What do you mean by "on average after 1000 questions"? Because that is the crux of my answer: GPT-4 won't be able to QA its own work for accuracy, or even relevance.
Know if it's reply to a prompt is actually useful.
Eg: prompt with "a helicopter is most efficient when ... "; "a helicopter is more efficient when"; and "helicopter efficiency can be improved by." GPT-4 will not be able to know which response is the best. Or even if any of the responses would actually move helicopter efficiency in the right direction.
Chance of being convicted is very weakly correlated with actual guilt, or even procecutor knowledge of guilt. Consider the case of an illegal search that definitively proves guilt, but cannot be presented at trial. The prosecution would probably want to proceed with their 10% case, since they know the defendent is guilty, even though the jury won't have key information.
Also, too, predicting outcomes is a skill separate from ability to generate outcomes. The Tails Come Apart, Goodheart, and all that..
Also, also, too almost no prosecutors are directly elected, judging people by any raw metric is wierd and not done - there would be some arbitrary cutoffs added to translate it to a quantified grading scale, and, as other commenters said, the root cause here is way deeper than "the DA charges 17 counts of assault, one for each blow, plus 3 Kidnapping, one for each grab, etc. Etc. Etc.
Consider also survivorship bias. If (6/5)^1000 people have been doing RR and you remember surviving 1,000 rounds, maybe "you" are just the one who happened to survive, and your counterfactual 1.2^1000 minus 1 "yous" are dead.