The objective of rationality is to become right instead of wrong.
I think this is technically false, in a subtle but important way. If I gained [knowledge of whether every six-digit number is prime] in exchange for [knowledge of whether wandering out into open traffic is a good idea], I'd have gleaned a net 899999 bits of right-ness, but it still wouldn't have been a worthwhile deal, or made me more rational in any practical sense. The missing gears are becoming right about important && relevant things, bothering to apply that knowledge, and - conditional on applying it at all - applying it well.
I think this project is good (Like, unusually good! It's a step forward! I enjoyed it, and I commend you for your service to the Cause!), but I notice a lack of emphasis on changing actions vs changing minds, both in this post and in the videos I watched, and I want to make sure you've noticed that too.
(And yes, I do recognize the irony of me pointing out a true thing about [pointing out true things without having an associated practical outcome] without having an associated practical outcome. Still think it's worth saying!)
Sorry about that, reality got in the way; also, ended up scrapping my concept for the next one and my backup concept for it; no idea when it'll end up actually made (not necessarily this month), except that I plan to release on a Friday to do the standard "10 days with a choice of weekend" thing.
Damn! Mea culpa; I'll edit the original post so anyone going through the archives won't have the same problem.
Also, strong-upvoted for asking "so, with X years of hindsight, how did this pan out?" on an old post. More people should do that.
Before circumstances let me answer that question, the client got bought out by a bigger company, which was (and is) a lot more cagey about both hiring contractors and sharing internal details with outsiders; last I heard, the client's absorbed remnants are still sometimes using my modelling approach, but I have no idea how much they're using it, how much they're relying on it, or to what extent it's benefiting them.
There are no time effects in the data; past trends can in generality be assumed to exist in the present.
The same way it does everything: in a weird, non-Euclidean manner which defies human intuition.
For the unreleased challenge, b) isn't for sale: making something intended to (eventually) be played by humans on LW and then using it solely as LLM-fodder would just be too sad. And I'm guessing you wouldn't want a) without b); if so, so much for that.
. . . if the "it must never be released to the public internet" constraint really is that stringent, I might be better advised to make D&D.Sci-style puzzles specifically for your purposes. The following questions then become relevant:
.How closely am I allowed to copy existing work? (This gets easier the more I can base it on something I've already done.)
.How many challenges are you likely to want, and how similar can they be to each other? (Half the difficulty on my end would be getting used to the requirements, format etc; I'd be more inclined to try this if I knew I could get paid for many challenges built along similar lines.)
.Is there a deadline? (When are you likely to no longer need challenges like this?) (Conversely, would I get anything extra for delivering a challenge within the next week or so?)
This seems like a natural fit for D&D.Sci games. All the ones I made are public domain, so you can use them freely (and I bet the other people who made some would give you permission if you asked them nicely), they've been publicly played by clever humans with a variety of skill levels and associated outcomes, and they're obscure enough that I doubt an LLM would have memorized the solutions (and if not you could tweak the names and data-generation hyperparameters to flatfoot them).
. . . I happen to have a completed-but-unreleased D&D.Sci game, which I was planning to put on LW early next month, after everyone got back from their holidays. Would it be helpful if I sent it to you and delayed the release until Feb, so you and yours could let LLMs try it first?
I am in literally the exact same situation, and think your proposed remedy makes sense.