[AN #102]: Meta learning by GPT-3, and a list of full proposals for AI alignment

The space of all possible algorithms one could run on three-digit-addition-strings like "218+375" seems rather vast. Could it be that what GPT3 is doing is something like

  • generating a large bunch of candidate algorithms, and
  • estimating the likelihoods of those algorithms given the examples, and
  • doing something like a noisy/weak Bayesian update, and
  • executing one of the higher-posterior algorithms, or some "fuzzy combination" of them?

Obviously this is just wild, vague speculation; but to me it intuitively seems like it would at least sort of answer your question. What do you think? (Could GPT3 be doing something like the above?)

(To a human, it might feel like [the actual algorithm for addition] is a glaringly obvious candidate. But, on something like a noisy simplicity prior over all possible string-manipulations-algorithms, [the actual algorithm for addition] maybe starts looking like just one of the more conspicuous needles in a haystack?)

Requesting feedback/advice: what Type Theory to study for AI safety?

Thanks for the suggestions!

Software Foundations in particular looks highly apposite, and in a useful format, to boot.

TaPL was indeed one of the books I was considering reading. (Hesitating between that and Lambda Calculus and Combinators: an Introduction.) Gave it another point in my mental ranking.

Based on the little I understood of MIRI’s papers (the first of which also prompted me to read their paper on Vingean Reflection), they seem interesting, but currently inaccessible to me. Added an edit/update to my post.