LESSWRONG
LW

RobertM
4729Ω4317049574
Message
Dialogue
Subscribe

LessWrong dev & admin as of July 5th, 2022.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
6RobertM's Shortform
3y
92
The Industrial Explosion
RobertM8d20

Mod note (for other readers): I think this is a good example of acceptable use of LLMs for translation purposes.  The comment reads to me[1] like it was written by a human and then translated fairly literally, without performing edits that would make it sound unfortunately LLM-like (perhaps with the exception of the em-dashes).

"Written entirely by you, a human" and "translated literally, without any additional editing performed by the LLM" are the two desiderata, which, if fulfilled, I will usually consider sufficient to screen off the fact that the words technically came out of an LLM[2].  (If you do this, I strongly recommend using a reasoning model, which is much less likely to end up rewriting your comment in its own style.  Also, I appreciate the disclaimer.  I don't know if I'd want it present in every single comment; the first time seems good and maybe having one in one's profile after that is sufficient?  Needs some more thought.)  This might sometimes prove insufficient, but I don't expect people honestly trying and failing at achieving good outcomes here to substantially increase our moderation burden.

  1. ^

    With the caveat that I only read the first few paragraphs closely and poked intermittently at the rest.

  2. ^

    This doesn't mean the comment will necessarily be approved, but if I reject it, it probably won't be for that reason.

Reply
Banning Said Achmiz (and broader thoughts on moderation)
RobertM9d30

He did not say that they made such claims on LessWrong, where he would be able to publicly cite them.  (I have seen/heard those claims in other contexts.)

Reply
Underdog bias rules everything around me
RobertM9d20

Curated!  I found the evopsych theory interesting but (as you say) speculative; I think the primary value of this post comes from presenting a distinct frame by which to analyze the world, one which I and probably many readers either didn't have distinctly carved out or part of their active toolkit.  I'm not sure if this particular frame will prove useful enough to make it into my active rotation, but it has the shape of something that could, in theory.

Reply
Debugging for Mid Coders
RobertM14d76

I've had many similar experiences.  Not confident, but I suspect a big part of this skill, at least for me, is something like "bucketing" - it's easy to pick out the important line from a screen-full of console logs if I'm familiar with the 20[1] different types of console logs I expect to see in a given context and know that I can safely ignore almost all of them as either being console spam or irrelevant to the current issue.  If you don't have that basically-instant recognition, which must necessarily be faster than "reading speed", the log output might as well be a black hole.

Becoming familiar with those 20 different types of console logs is some combination of general domain experience, project-specific experience, and native learning speed (for this kind of pattern matching).

Similar effect when reading code, and I suspect why some people care what seems like disproportionately much about coding standards/style/convention - if your codebase doesn't follow a consistent style/set of conventions, you can end up paying a pretty large penalty by absence of that speedup.

  1. ^

    Made up number

Reply
Stephen Martin's Shortform
RobertM15d312

Not having talked to any such people myself, I think I tentatively disbelieve that those are their true objections (despite their claims).  My best guess as to what actual objection would be most likely to generate that external claim would be something like... "this is an extremely weird thing to be worried about, and very far outside of (my) Overton window, so I'm worried that your motivations for doing [x] are not true concern about model welfare but something bad that you don't want to say out loud".

Reply
The Problem
RobertM18d60

This is, broadly speaking, the problem of corrigibility, and how to formalize it is currently an open research problem.  (There's the separate question whether it's possible to make systems robustly corrigible in practice without having a good formalized notion of what that even means; this seems tricky.)

Reply
Strong Evidence is Common
RobertM19d20

Thanks for the heads-up, I've fixed it in the post.

Reply
The Problem
RobertM21d7-3

Curated!  I think that this post is one of the best attempts I've seen at concisely summarizing... the problem, as it were, in a way that highlights the important parts, while remaining accessible to an educated lay-audience.  The (modern) examples scattered throughout were effective, in particular the use of Golden Gate Claude as an example of the difficulty of making AIs believe false things was quite good.

I agree with Ryan that the claim re: speed of AI reaching superhuman capabilities is somewhat overstated.  Unfortunately, this doesn't seem load-bearing for the argument; I don't feel that much more hopeful if we have 2-5 years to use/study/work with AI systems that are only slightly-superhuman at R&D (or some similar target).  You could write an entire book about why this wouldn't be enough.  (The sequences do cover a lot of the reasons.)

Reply1
Just Make a New Rule!
RobertM1mo71

I believe this post to be substantially motivated by Zack's disagreement with LessWrong moderators about appropriate norms on LessWrong.  (Epistemic status: I am one of the moderators who spoke to Zack on the subject, as indicated[1] in the footer of his post.)

  1. ^

    Sort of.

Reply
Load More
73Briefly analyzing the 10-year moratorium amendment
3mo
1
31"The Urgency of Interpretability" (Dario Amodei)
4mo
23
207Eliezer's Lost Alignment Articles / The Arbital Sequence
6mo
10
281Arbital has been imported to LessWrong
6mo
30
29Corrigibility's Desirability is Timing-Sensitive
8mo
4
87Re: Anthropic's suggested SB-1047 amendments
1y
13
46Enriched tab is now the default LW Frontpage experience for logged-in users
1y
27
77[New Feature] Your Subscribed Feed
1y
13
31Against "argument from overhang risk"
1y
11
71LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!")
1y
27
Load More
Our community should relocate to Japan.
18d
(-155)
Negative Utilitarianism
18d
(-174)
In 2017, Ukraine will neither break into all-out war or get neatly resolved
18d
(-192)
Inferential Distance
1mo
Guide to the LessWrong Editor
2mo
Guide to the LessWrong Editor
2mo
(+29/-94)
Simulation Argument
3mo
(-1)
AI Safety & Entrepreneurship
3mo
Eliezer's Lost Alignment Articles / The Arbital Sequence
6mo
(+3/-4)
Solomonoff induction
6mo
Load More