Wiki Contributions


Interesting, I checked LW/Google for the keyword before writing and didn't see much, but maybe I missed it; it does seem like a fairly natural riff, e.g. someone wrote a similar post on EA forum a few months later.

I can imagine it being the case that their ability to reveal this information is their main source of leverage (over e.g. who replaces them on the board).

I do have substantial credence (~15%?) on AGI being built by hobbyists/small teams. I definitely think it's more likely to be built by huge teams with huge computers, like most recent advances. But my guess is that physics permits vastly more algorithmic efficiency than we've discovered, and it seems pretty plausible to me—especially in worlds with longer timelines—that some small group might discover enough of it in time.

Nonetheless, I acknowledge that my disagreement with these proposals often comes down to a more fundamental disagreement about the difficulty of alignment, rather than any beliefs about the social response to AI risk.

My guess is that this disagreement (about the difficulty of alignment) also mostly explains the disagreement about humanity’s relative attentiveness/competence. If the recent regulatory moves seem encouraging to you, I can see how that would seem like counterevidence to the claim that governments are unlikely to help much with AI risk.

But personally it doesn’t seem like much counterevidence, because the recent moves haven’t seemed very encouraging. They’ve struck me as slightly encouraging, insofar as they’ve caused me to update slightly upwards that governments might eventually coordinate to entirely halt AI development. But absent the sort of scientific understanding we typically have when deploying high-risk engineering projects—where e.g., we can answer at least most of the basic questions about how the systems work, and generally understand how to predict in advance what will happen if we do various things, etc.—little short of a Stop is going to reduce my alarm much.

AI used to be a science. In the old days (back when AI didn't work very well), people were attempting to develop a working theory of cognition.

Those scientists didn’t succeed, and those days are behind us.

I claim many of them did succeed, for example:

  • George Boole invented boolean algebra in order to establish (part of) a working theory of cognition—the book where he introduces it is titled "An Investigation of the Laws of Thought,” and his stated aim was largely to help explain how minds work.[1]
  • Ramón y Cajal discovered neurons in the course of trying to better understand cognition.[2]
  • Turing described his research as aimed at figuring out what intelligence is, what it would mean for something to “think,” etc.[3]
  • Shannon didn’t frame his work this way quite as explicitly, but information theory is useful because it characterizes constraints on the transmission of thoughts/cognition between people, and I think he was clearly generally interested in figuring out what was up with agents/minds—e.g., he spent time trying to design machines to navigate mazes, repair themselves, replicate, etc.
  • Geoffrey Hinton initially became interested in neural networks because he was trying to figure out how brains worked.

Not all of these scientists thought of themselves as working on AI, of course, but I do think many of the key discoveries which make modern AI possible—boolean algebra, neurons, computers, information theory, neural networks—were developed by people trying to develop theories of cognition.

  1. ^

     The opening paragraph of Boole’s book:  "The design of the following treatise is to investigate the fundamental laws of those operations of the mind by which reasoning is performed; to give expression to them in the symbolical language of a Calculus, and upon this foundation to establish the science of Logic and construct its method; to make that method itself the basis of a general method for the application of the mathematical doctrine of Probabilities; and, finally, to collect from the various elements of truth brought to view in the course of these inquiries some probable intimations concerning the nature and constitution of the human mind."

  2. ^

     From Cajal’s autobiography:  "... the problem attracted us irresistibly. We saw that an exact knowledge of the structure of the brain was of supreme interest for the building up of a rational psychology. To know the brain, we said, is equivalent to ascertaining the material course of thought and will, to discovering the intimate history of life in its perpetual duel with external forces; a history summarized, and in a way engraved in the defensive neuronal coordinations of the reflex, of instinct, and of the association of ideas" (305).

  3. ^

     The opening paragraph of Turing’s paper, Computing Machinery and Intelligence:  "I propose to consider the question, 'Can machines think?' This should begin with definitions of the meaning of the terms 'machine 'and 'think'. The definitions might be framed so as to reflect so far as possible the normal use of the words, but this attitude is dangerous. If the meaning of the words 'machine' and 'think 'are to be found by examining how they are commonly used it is difficult to escape the conclusion that the meaning and the answer to the question, 'Can machines think?' is to be sought in a statistical survey such as a Gallup poll. But this is absurd. Instead of attempting such a definition I shall replace the question by another, which is closely related to it and is expressed in relatively unambiguous words."

But it's not just language any longer either, with image inputs, etc... all else equal I'd prefer a name that emphasized how little we understand how they work ("model" seems to me to connote the opposite), but I don't have any great suggestions.

I just meant that Faraday's research strikes me as counterevidence for the claim I was making—he had excellent feedback loops, yet also seems to me to have had excellent pre-paradigmatic research taste/next-question-generating skill of the sort my prior suggests generally trades off against strong focus on quickly-checkable claims. So maybe my prior is missing something!

Yeah, my impression is similarly that focus on feedback loops is closer to "the core thing that's gone wrong so far with alignment research," than to "the core thing that's been missing." I wouldn't normally put it this way, since I think many types of feedback loops are great, and since obviously in the end alignment research is useless unless it helps us better engineer AI systems in the actual territory, etc. 

(And also because some examples of focus on tight feedback loops, like Faraday's research, strike me as exceedingly excellent, although I haven't really figured out yet why his work seems so much closer to the spirit we need than e.g. thinking physics problems).

Like, all else equal, it clearly seems better to have better empirical feedback; I think my objection is mostly that in practice, focus on this seems to lead people to premature formalization, or to otherwise constraining their lines of inquiry to those whose steps are easy to explain/justify along the way.

Another way to put this: most examples I've seen of people trying to practice attending to tight feedback have involved them focusing on trivial problems, like simple video games or toy already-solved science problems, and I think this isn't a coincidence. So while I share your sense Raemon that transfer learning seems possible here, my guess is that this sort of practice mostly transfers within the domain of other trivial problems, where solutions (or at least methods for locating solutions) are already known, and hence where it's easy to verify you're making progress along the way.

I've been trying to spend a bit more time voting in response to this, to try to help keep thread quality high; at least for now, the size of the influx strikes me as low enough that a few long-time users doing this might help a bunch.

I agree we don't really understand anything in LLMs at this level of detail, but I liked Jan highlighting this confusion anyway, since I think it's useful to promote particular weird behaviors to attention. I would be quite thrilled if more people got nerd sniped on trying to explain such things!

Load More