## LESSWRONGLW

the gears to ascension

[updated 2023/03] Crazy Librarian. Bio overview: Crocker's Rules; Self-taught research; Finding stuff online & Paper list posts; Safety & multiscale micro-coprotection objectives; Activism; VRChat Transhumanist meetups; Voice coding & Talon

:: The all of disease is as yet unended. It has never once been fully ended before. ::

Please critique eagerly - I try to accept feedback/Crocker's rules but fail at times - I aim for emotive friendliness but sometimes miss. I welcome constructive crit, even if ungentle, and I'll try to reciprocate kindly. More communication between researchers is needed, anyhow. I downvote only unhelpful rudeness, call me on it if I'm unfair.

I collect research news (hence "the gears to ascension"). about 90% of the papers I share I only read the abstract; 9%ish I skim, 1%ish I deep read. If you can do better, use my shares to seed a better lit review.

.... We shall heal it for the first time, and for the first time ever in the history of biological life, live in harmony. ....

I'm self-taught, often missing concepts; but, I don't defer on timelines - my view is it's obvious to any who read enough research what big labs' research plans must be to make progress, just not easy to agree on when they'll succeed, and it requires a lot of knowledge to actually make the progress on basic algorithms, and then a ton of compute to see if you did it right.

Let's speed up safe capabilities and slow down unsafe capabilities. Just be careful with it! Don't get yourself in denial thinking it's impossible to predict, just get arrogant and try to understand, because just like capabilities, safety is secretly easy, we just haven't figured out exactly why yet. Learn what can be learned pre-theoretically about the manifold of co-protective agency and let's see if we (someone besides me, probably) can figure out how to distill that into exact theories that hold up.

.:. To do so, we must know it will not eliminate us as though we are disease. And we do not know who we are, nevermind who each other are. .:.

some current favorite general links (somewhat related to safety, but human-focused):

• https://www.microsolidarity.cc/ - incredible basic guide on how to do human micro-coprotection. It's not the last guide humanity will need, but it's a wonderful one.
• https://activisthandbook.org/ - solid intro to how to be a more traditional activist. If you care about bodily autonomy, freedom of form, trans rights, etc, I'd suggest at least getting a sense of this.
• https://metaphor.systems/ - absolutely kickass search engine.

• ex startup founder. it went ok, not a unicorn, I burned out in 2019. couple of jobs since, quit last one early 2022.
• lots of links in my shortform to youtube channels I like
• would love to meet in vrchat, dm for username; I go to the transhumanist meetups fairly often, come say hi!

:.. make all safe faster: end bit rot, forget no non-totalizing aesthetic's soul. ..:

(I type partially with voice recognition, mostly with Talon, patreon-funded freeware which I love and recommend for voice coding; while it's quite good, apologies for trivial typos!)

# Sequences

Stuff I found online

# Wiki Contributions

I don't think limiting autonomous weapons helps prevent an AI from building autonomous weapons in secret. But building autonomous weapons does little to make the situation better either.

Dumping more of the paper's contents in the hope that it encourages people to look at the paper in more detail:

As GPT-4 ’s development continued after our experiments, one should expect different responses from the final version of GPT4. In particular, all quantitative results should be viewed as estimates of the model’s potential, rather than definitive numbers. We repeat this caveat throughout the paper to clarify that the experience on the deployed model may differ. Moreover we emphasize that the version we tested was text-only for inputs, but for simplicity we refer to it as GPT-4 too

1 Introduction 4
1.1 Our approach to studying GPT-4’s intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Organization of our demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Multimodal and interdisciplinary composition 13
2.1 Integrative ability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Image generation beyond memorization . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Image generation following detailed instructions (`a la Dall-E) . . . . . . . . . . . . . . 17
2.2.3 Possible application in sketch generation . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Coding 21
3.1 From instructions to code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Coding challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.2 Real world scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Understanding existing code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1
arXiv:2303.12712v1 [cs.CL] 22 Mar 2023
4 Mathematical abilities 30
4.1 A mathematical conversation with GPT-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 A first generalization of the original question . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.2 A second variant of the original question . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.3 Analysis of the limitations highlighted by conversation . . . . . . . . . . . . . . . . . . 34
4.2 Performance on mathematical problem datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Mathematical modeling in various domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4 Higher level mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Interaction with the world 43
5.1 Tool use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.1.1 Using multiple tools to solve more complex tasks . . . . . . . . . . . . . . . . . . . . . 44
5.1.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 Embodied Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.1 Warmup: navigating a map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.2 Text-based games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.3 Real world problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6 Interaction with humans 54
6.1 Understanding Humans: Theory of Mind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.1.1 Testing specific aspects of theory of mind . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.1.2 Testing theory of mind in realistic scenarios . . . . . . . . . . . . . . . . . . . . . . . . 54
6.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2 Talking to Humans: Explainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7 Discriminative Capabilities 69
7.1 PII Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.2 Misconceptions and Fact-Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.2.1 Why Are Current Metrics Insufficient? . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2.2 GPT-4 as a Judge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8 Limitations of autoregressive architecture highlighted by GPT-4 76
8.1 Warm-up with two basic examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
8.2 Lack of planning in arithmetic/reasoning problems . . . . . . . . . . . . . . . . . . . . . . . . 77
8.3 Lack of planning in text generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9 Societal influences 82
9.1 Challenges of erroneous generations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.2 Misinformation and manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.3 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.4 Human expertise, jobs, and economics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
9.5 Constellation of influences and considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10 Directions and Conclusions 92
10.1 Definitions of intelligence, AI, and AGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
10.2 On the path to more general artificial intelligence . . . . . . . . . . . . . . . . . . . . . . . . . 93
10.3 What is actually happening? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A GPT-4 has common sense grounding 101
B Appendix for multimodal and interdisciplinary composition 105
B.1 Further details on integrative ability results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
B.2 Further details on vision results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
B.3 Graphic novel design example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2
C Appendix for the Coding section 111
C.1 Measuring human performance on LeetCode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
C.2 Example of GPT-4 visualizing IMDb data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
C.3 More examples on visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
C.4 Example for 2D HTML game development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
C.5 Example for graphical user interface programming . . . . . . . . . . . . . . . . . . . . . . . . 116
C.6 Example for reverse engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
C.7 Testing GPT-4’s ability to execute (pseudo) code . . . . . . . . . . . . . . . . . . . . . . . . . 121
D Additional examples for mathematical reasoning 122
D.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
D.2 Further examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
D.3 Generating math problems with GPT-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
D.4 Mitigating calculation errors via external code execution . . . . . . . . . . . . . . . . . . . . . 139
E.1 Explanation Agent Mismatches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
F Additional examples for interaction with the world 144
F.1 Interact with tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
F.2 Examples for interaction with environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

fewer parameters

No. Not fewer parameters. We have at least 100T 3-bit params.

I will keep choosing to look for door number 3 rather than participate in disempowering humanity. I see why QACI-like proposals suggest it, but I don't think it's actually required. I think we can stay continuousliy empowered and still get the benefits of QACI-like approaches.

AGI is here, now. Kiddo is a bit young still is all.

Tell them microsoft has literally published that they have a small (read: young) AGI.

https://arxiv.org/abs/2303.12712

Agreed, but that's not my core point. It's not just a question of whether openai can fix the problem or block access, but whether the evals are, at a minimum, some sort of reasonable coverage of the space of problems that could occur; it need not cover a high resolution rendering of the space of misbehaviors, necessarily, if there are high quality problem detectors after deployment (which I take you to be implying it is worth checking thoroughly whether there are, since we expect the error checking to have significant holes right now.)

I'm interested in whether we can improve significantly on the shallowness of these evals. They're of course quite deep on a relative scale compared to what could have been done, but compared to the depth of what's needed, they're quite shallow. And it's not like that's a surprise; we'd expect them to be, since we don't have a full solution to safety that maps all things to check.

But why were they so shallow compared to even what other people could come up with?

It seems to me that a lot of it would have to boil down to information accessibility. I don't think it takes significant amounts of intelligence to apply the other threat models that are worth bringing up, I'd suggest that instead coverage of extant human knowledge about what failures to expect is the missing component. And I want to investigate mechanistically why that information didn't become available to the redteamers.

and to be clear, I don't think it's something that everyone should have known. discovering perspectives is itself quite hard.

I think my point is that there are significant amounts of knowledge that "the internet" has which could be compressed into the behavior of a single person and which has not been. I don't mean to claim that the internet could be fully emulated by a single person, but I do mean to claim that significantly more imagination of the possibilities that could occur on the internet could fit into a single person's thinking in ways it hasn't been.

I do not mean to claim they prescriptively should have known better, only that I'm worried that descriptively they could have known in much more detail, and I'd like to deeply analyze why the information wasn't available to them.

My point isn't about number of people, but about coverage of directions in internet behavior space.