DPiepgrass

Worried that typical commenters at LW care way less than I expected about good epistemic practice. Hoping I'm wrong.

Software developer and EA with interests including programming language design, international auxiliary languages, rationalism, climate science and the psychology of its denial.

Looking for someone similar to myself to be my new best friend:

❖ Close friendship, preferably sharing a house ❖ Rationalist-appreciating epistemology; a love of accuracy and precision to the extent it is useful or important (but not excessively pedantic) ❖ Geeky, curious, and interested in improving the world ❖ Liberal/humanist values, such as a dislike of extreme inequality based on minor or irrelevant differences in starting points, and a like for ideas that may lead to solving such inequality. (OTOH, minor inequalities are certainly necessary and acceptable, and a high floor is clearly better than a low ceiling: an "equality" in which all are impoverished would be very bad) ❖ A love of freedom ❖ Utilitarian/consequentialist-leaning; preferably negative utilitarian ❖ High openness to experience: tolerance of ambiguity, low dogmatism, unconventionality, and again, intellectual curiosity ❖ I'm a nudist and would like someone who can participate at least sometimes ❖ Agnostic, atheist, or at least feeling doubts

Wiki Contributions

Comments

Sorted by
DPiepgrass-1-2

I actually think Yudkowsky's biggest problem may be that he is not talking about his models. In his most prominent posts about AGI doom, such as this and the List of Lethalities, he needs to provide a complete model that clearly and convincingly leads to doom (hopefully without the extreme rhetoric) in order to justify the extreme rhetoric. Why does attempted, but imperfect, alignment lead universally to doom in all likely AGI designs*, when we lack familiarity with the relevant mind design space, or with how long it will take to escalate a given design from AGI to ASI?

* I know his claim isn't quite this expansive, but his rhetorical style encourages an expansive interpretation.

I'm baffled he gives so little effort to explaining his model. In List of Lethalities he spends a few paragraphs of preamble to cover some essential elements of concern (-3, -2, -1), then offers a few potentially-reasonable-but-minimally-supported assertions, before spending much of the rest of the article prattling off the various ways AGI can kill everyone. Personally I felt like he just skipped over a lot of the important topics, and so didn't bother to read it to the end.

I think there is probably some time after the first AGI or quasi-AGI arrives, but before the most genocide-prone AGI arrives, in which alignment work can still be done. Eliezer's rhetorical approach confusingly chooses to burn bridges with this world, as he and MIRI (and probably, by association, rationalists) will be regarded as a laughing stock when that world arrives. Various techbros including AI researchers will be saying "well, AGI came and we're all still alive, yet there's EY still reciting his doomer nonsense". EY will uselessly protest "I didn't say AGI would necessarily kill everyone right away" while the techbros retweet old EY quotes that kinda sound like that's what he's saying.

Edit: for whoever disagreed & downvoted: what for? You know there are e/accs on Twitter telling everyone that the idea of x-risk is based on Yudkowsky being "king of his tribe", and surely you know that this is not how LessWrong is supposed to work. The risk isn't supposed to be based on EY's say-so; a complete and convincing model is needed. If, on the other hand, you disagreed that his communication is incomplete and unconvincing, it should not offend you that not everyone agrees. Like, holy shit: you think humanity will cause apocalypse because it's not listening to EY, but how dare somebody suggest that EY needs better communication. I wrote this comment because I think it's very important; what are you here for?

P.S. if I'm wrong about the timeline―if it takes >15 years―my guess for how I'm wrong is (1) a major downturn in AGI/AI research investment and (2) executive misallocation of resources. I've been thinking that the brightest minds of the AI world are working on AGI, but maybe they're just paid a lot because there are too few minds to go around. And when I think of my favorite MS developer tools, they have greatly improved over the years, but there are also fixable things that haven't been fixed in 20 years, and good ideas they've never tried, and MS has created a surprising number of badly designed libraries (not to mention products) over the years. And I know people close to Google have a variety of their own pet peeves about Google.

Are AGI companies like this? Do they burn mountains cash to pay otherwise average engineers who happen to have AI skills? Do they tend to ignore promising research directions because the results are uncertain, or because results won't materialize in the next year, or because they don't need a supercomputer or aren't based mainly on transformers? Are they bad at creating tools that would've made the company more efficient? Certainly I expect some companies to be like that.

As for (1), I'm no great fan of copyright law, but today's companies are probably built on a foundation of rampant piracy, and litigation might kill investment. Or, investors may be scared away by a persistent lack of discoveries to increase reliability / curtail hallucinations.

Doesn't the problem have no solution without a spare block?

Worth noting that LLMs don't see a nicely formatted numeric list, they see a linear sequence of tokens, e.g. I can replace all my newlines with something else and Copilot still gets it:

brief testing doesn't show worse completions than when there are newlines. (and in the version with newlines this particular completion is oddly incomplete.)

Anyone know how LLMs tend to behave on text that is ambiguous―or unambiguous but "hard to parse"? I wonder if they "see" a superposition of meanings "mixed together" and produce a response that "sounds good for the mixture".

I'm having trouble discerning a difference between our opinions, as I expect a "kind-of AGI" to come out of LLM tech, given enough investment. Re: code assistants, I'm generally disappointed with Github Copilot. It's not unusual that I'm like "wow, good job", but bad completions are commonplace, especially when I ask a question in the sidebar (which should use a bigger LLM). Its (very hallucinatory) response typically demonstrates that it doesn't understand our (relatively small) codebase very well, to the point where I only occasionally bother asking. (I keep wondering "did no one at GitHub think to generate an outline of the app that could fit in the context window?")

A title like "some people can notice more imperfections than you (and they get irked)" would be more accurate and less clickbaity, though when written like that it it sounds kind of obvious.

Do you mean the "send us a message" popup at bottom-right?

Yikes! Apparently I said "strong disagree" when I meant "strong downvote". Fixed. Sorry. Disagree votes generally don't bother me either, they just make me curious what the disagreer disagrees about.

Shoot, I forgot that high-karma users have a "small-strength" of 2, so I can't tell if it was strong-downvoted or not. I mistakenly assumed it was a newish user. Edit: P.S. I might feel better if karma was hidden on my own new comments, whether or not they are hidden on others, though it would then be harder to guess at the vote distribution, making the information even more useless than usual if it survives the hiding-period. Still seems like a net win for the emotional benefits.

I wrote a long comment and someone took the "strong downvote, no disagreement, no reply" tack again. Poppy-cutters[1] seem way more common at LessWrong than I could've predicted, and I'd like to see statistics to see how common they are, and to see whether my experience here is normal or not.

  1. ^

    Meaning users who just seem to want to make others feel bad about their opinion, especially unconstructively. Interesting statistics might include the distributions of users who strong-downvote a positive score into a negative one (as compared to their other voting behaviors), who downvote more than they disagree, who downvote more than they upvote, who strong-downvote more than they downvote, and/or strong-downvote more often than they reply. Also: is there an 80/20 rule like "20% of the people do 80% of the downvotes"? Is the distribution different for "power users" than "casual users"?

Given that I think LLMs don't generalize, I was surprised how compelling Aschenbrenner's case sounded when I read it (well, the first half of it. I'm short on time...). He seemed to have taken all the same evidence I knew about it, and arranged it into a very different framing. But I also felt like he underweighted criticism from the likes of Gary Marcus. To me, the illusion of LLMs being "smart" has been broken for a year or so.

To the extent LLMs appear to build world models, I think what you're seeing is a bunch of disorganized neurons and connections that, when probed with a systematic method, can be mapped onto things that we know a world model ought to contain. A couple of important questions are 

  1. the way that such a world model was formed and
  2. how easily we can easily figure out how to form those models better/differently[1].

I think LLMs get "world models" (which don't in fact cover the whole world) in a way that is quite unlike the way intelligent humans form their own world models―and more like how unintelligent or confused humans do the same.

The way I see it, LLMs learn in much the same way a struggling D student learns (if I understand correctly how such a student learns), and the reason LLMs sometimes perform like an A student is because they have extra advantages that regular D students do not: unlimited attention span and ultrafast, ultra-precise processing backed by an extremely large set of training data. So why do D students perform badly, even with "lots" of studying? I think it's either because they are not trying to build mental models, or because they don't really follow what their teachers are saying. Either way, this leads them to fall back on secondary "pattern-matching" learning mode which doesn't depend on a world model.

If, when learning in this mode, you see enough patterns, you will learn an implicit world model. The implicit model is a proper world model in terms of predictive power, but 

  1. It requires much more training data to predict as well as a human system-2 can, which explains why D students perform worse than A students on the same amount of training data―and this is one of the reasons why LLMs need so much more training data than humans do in order to perform at an A level (other reasons: less compute per token, fewer total synapses, no ability to "mentally" generate training data, inability to autonomously choose what to train on). The way you should learn is to first develop an explicit worldmodel via system-2 thinking, then use system-2 to mentally generate training data which (along with external data) feeds into system-1. LLMs cannot do this.
  2. Such a model tends to be harder to explain in words than an explicit world model, because the predictions are coming from system-1 without much involvement from system-2, and so much of the model is not consciously visible to the student, nor is it properly connected to its linguistic form, so the D student relies more on "feeling around" the system-1 model via queries (e.g. to figure out whether "citation" is a noun, you can do things like ask your system-1 whether "the citation" is a valid phrase―human language skills tend to always develop as pure system-1 initially, so a good linguistics course teaches you explicitly to perform these queries to extract information, whereas if you have a mostly-system-2 understanding of a language, you can use that to decide whether a phrase is correct with system-2, without an intuition about whether it's correct. My system-1 for Spanish is badly underdeveloped, so I lean on my superior system-2/analytical understanding of grammar).

    When an LLM cites a correct definition of something as if it were a textbook, then immediately afterward fails to apply that definition to the question you ask, I think that indicates the LLM doesn't really have a world model with respect to that question, but I would go further and say that even if it has a good world model, it cannot express its world-model in words, it can only express the textbook definitions it has seen and then apply its implicit world-model, which may or may not match what it said verbally.

So if you just keep training it on more unique data, eventually it "gets it", but I think it "gets it" the way a D student does, implicitly not explicitly. With enough experience, the D student can be competent, but never as good as similarly-experienced A students.

A corollary of the above is that I think the amount of compute required for AGI is wildly overestimated, if not by Aschenbrenner himself then by less nuanced versions of his style of thinking (e.g. Sam Altman). And much of the danger of AGI follows from this. On a meta level, my own opinions on AGI are mostly not formed via "training data", since I have not read/seen that many articles and videos about AGI alignment (compared to any actual alignment researcher). No coincidence, then, that I was always an A to A- student, and the one time I got a C- in a technical course was when I couldn't figure out WTF the professor was talking about. I still learned, that's why I got a C-, but I learned in a way that seemed unnatural to me, but which incorporated some of the "brute force" that an LLM would use. I'm all about mental models and evidence; LLMs are about neither.

Aschenbrenner did help firm up my sense that current LLM tech leads to "quasi-AGI": a competent humanlike digital assistant, probably one that can do some AI research autonomously. It appears that the AI industry (or maybe just OpenAI) is on an evolutionary approach of "let's just tweak LLMs and our processes around them". This may lead (via human ingenuity or chance discovery) to system-2s with explicit worldmodels, but without some breakthough, it just leads to relatively safe quasi-AGIs, the sort that probably won't generate groundbreaking new cancer-fighting ideas but might do a good job testing ideas for curing cancer that are "obvious" or human-generated or both.

Although LLMs badly suck at reasoning, my AGI timelines are still kinda short―roughly 1 to 15 years for "real" AGI, with quasi-AGI in 2 to 6 years―mainly because so much funding is going into this, and because only one researcher needs to figure out the secret, and because so much research is being shared publicly, and because there should be many ways to do AGI, and because quasi-AGI (if invented first) might help create real AGI. Even the AGI safety people[2] might be the ones to invent AGI, for how else will they do effective safety research? FWIW my prediction is that quasi-AGI is consists of a transformer architecture with quite a large number of (conventional software) tricks and tweaks bolted on to it, while real AGI consists of transformer architecture plus a smaller number of tricks and tweaks, plus a second breakthrough of the same magnitude as transformer architecture itself (or a pair of ideas that work so well together that combining them counts as a breakthrough).

EDIT: if anyone thinks I'm on to something here, let me know your thoughts as to whether I should redact the post lest changing minds in this regard is itself hazardous. My thinking for now, though, is that presenting ideas to a safety-conscious audience might well be better than safetyists nodding along to a mental model that I think is, if not incorrect, then poorly framed.

  1. ^

    I don't follow ML research, so let me know if you know of proposed solutions already.

  2. ^

    why are we calling it "AI safety"? I think this term generates a lot of "the real danger of AI is bias/disinformation/etc" responses, which should decrease if we make the actual topic clear

Load More