Sorted by New

Wiki Contributions


The core B/E dichotomy rang true, but the post also seemed to imply a correlated separation between autonomous and joint success/failure modes: building couples succeed/fail on one thing together, entertaining couples succeed/fail on two things separately. 

I have not observed this to be true. Experientially, it seems a little like a quadrant, where the building / entertaining distinction is about the type of interaction you crave in a relationship, and autonomous / joint distinction is about how you focus your productive energies. 


  • Building / Joint: (as above) two individuals building a home / business / family together
  • Building / Autonomous: two individuals with distinct careers and interests, who both derive great meaning from helping the other achieve their goals. 
  • Entertaining / Joint: two individuals who enjoy entertainment and focus on that pursuit together. A canonical example might be childless couples who frequently travel, host parties, etc, or the "best friends who do everything together" couple everyone knows.  
  • Entertaining / Autonomous: (as above) individuals with separate lives who come together for conversation, sex, etc. 

I might be extra sensitive to this, my last relationship failed because my partner wanted an "EJ" relationship while I wanted a "BA" relationship, neither of which followed cleanly from the post. 

"What is intelligence?" is a question you can spend an entire productive academic career failing to answer. Intentionally ignoring the nerd bait, I do think this post highlights how important it is for AGI worriers to better articulate which specific qualities of "intelligent" agents are the most worrisome and why. 

For example, there has been a lot of handwringing over the scaling properties of language models, especially in the GPT family. But as Gary Marcus continues to point out in his inimitable and slightly controversial way, scaling these models fails to fix some extremely simple logical mistakes - logical mistakes that might need to be fixed by a non-scaling innovation before an intelligent agent poses an ex-risk. On forums like these it has long been popular to say something along the lines of "holy shit look how much better these models got when you add __ amount of compute! If we extrapolate that out we are so boned." But this line of thinking seems to miss the "intelligence" part of AGI completely, it seemingly has no sense at all of the nature of the gap between the models that exist today and the spooky models they worry about. 

It seems to me that we need a better specification for describing what exactly intelligent agents can do and how they get there.

I'm seeking some clarification, my reading of your post is that you see the following concepts as intertwined:

  1. Efficient representation of learned information
  2. Efficient learning of information

As you point out (and I agree) that transformer parameters live in a small space and the realities of human biology seem to imply that we can do #1 better, that is, use a "lighter" algorithm with fewer free parameters to store our learned information. 

If I understand you correctly, you believe that this "far more efficient architecture trying to get out" would also be better at #2 (require less data to reach this efficient representation). While I agree that an algorithm to do this better must exist, it is not obvious to me that a better compressed/sparse storage format for language models would necessarily require less data to train. 

So, questions: Did I misunderstand you, and if so, where? Are there additional reasons you believe the two concepts to be correlated?