# All of Michaël Trazzi's Comments + Replies

What will GPT-4 be incapable of?

sorry I meant a bot that played random move, not a randomly sampled go bot from KGS. agreed with GPT-4 not beating average go bot

What will GPT-4 be incapable of?

What will GPT-4 be incapable of?

let's say by concatenating your textbooks you get plenty of examples of  with "blablabla object sky blablabla gravity  blablabla  blabla . And then the exercise is: "blablabla object of mass blablabla thrown from the sky, what's the force? a) f=120 b) ... c) ... d) ...". then what you need to do is just do some prompt programming at the beginning by "for looping answer" and teaching it to return either a,b,c or d. Now, I don't see any reason why a neural net couldn't approximate linear fun... (read more)

1Ericf7dGenerally the answers aren't multiple choice. Here's a couple examples of questions from a 5th grade science textbook I found on Google: 1. How would you state your address in space. Explain your answer. 2. Would you weigh the same on the sun as you do on Earth. Explain your answer. 3. Why is it so difficult to design a real-scale model of the solar system?
What will GPT-4 be incapable of?

I think the general answer to testing seems AGI-complete in the sense that you should understand the edge-cases of a function (or correct output from "normal" input).

if we take the simplest testing case, let's say python using pytest, with a typed code, with some simple test for each type (eg. 0 and 1 for integers, empty/random strings, etc.) then you could show it examples on how to generate tests from function names... but then you could also just do it with reg-ex, so I guess with hypothesis.

so maybe the right question to ask is: what do you expect GPT-... (read more)

1Zac Hatfield Dodds7dTesting in full generality is certainly AGI-complete (and a nice ingredient for recursive self-improvement!), but I think you're overestimating the difficulty of pattern-matching your way to decent tests. Chess used to be considered AGI-complete too; I'd guess testing is more like poetry+arithmetic in that if you can handle context, style, and some details it comes out pretty nicely. I expect GPT-4 to be substantially better at this 'out of the box' due to * the usual combination of larger, better at generalising, scaling laws, etc. * super-linear performance gains on arithmetic-like tasks due to generalisation, with spillover to code-related tasks * the extra github (and blog, etc) data is probably pretty helpful given steady adoption since ~2015 or so -------------------------------------------------------------------------------- Example outputs from Ghostwriter vs GPT-3: $hypothesis write gzip.compress import gzip from hypothesis import given, strategies as st @given(compresslevel=st.just(9), data=st.nothing()) def test_roundtrip_compress_decompress(compresslevel, data): value0 = gzip.compress(data=data, compresslevel=compresslevel) value1 = gzip.decompress(data=value0) assert data == value1, (data, value1) while GPT-3 tends to produce examples like (first four that I generated just now): @given(st.bytes(st.uuid4())) def test(x): expected = x result = bytes(gzip(x)) assert bytes(result) == expected @given(st.bytes()) def test_round_trip(xs): compressed, uncompressed = gzip_and_unzip(xs) assert is_equal(compressed, uncompressed) @given(st.bytes("foobar")) def test(xs): assert gzip(xs) == xs @given(st.bytes()) def test(xs): zipped_xs = gzip(xs) uncompressed_xs = zlib.decompress(zipped_xs) assert zipped_xs == uncompressed_xs So it's clearly 'getting the right idea', even without any fine-tuning at all, but not there yet. It's also a lot worse at this without a natural-language descrip What will GPT-4 be incapable of? well if we're doing a bet then at some point we need to "resolve" the prediction. so we ask GPT-4 the same physics question 1000 times and then some humans judges count how many it got right, if it gets it right more than let's say 95% of the time (or any confidence interval) , then we would resolve this positively. of course you could do more than 1000, and with law of large numbers it should converge to the true probability of giving the right answer? 1Ericf7dThat wouldn't be useful, though. My assertion is more like: After getting the content of elementary school science textbooks (or high school physics, or whatever other school science content makes sense), but not including the end-of-chapter questions (and especially not the answers), GPT-4 will be unable to provide the correct answer to more then 50% of the questions from the end of the chapters, constrained by having to take the first response that looks like a solution as it's "answer" and not throwing away more than 3 obviously gibberish or bullshit responses per question. And that 50% number is based on giving it every question without discrimination. If we only count the synthesis questions (as opposed to the memory/definition questions), I predict 1%, but would bet on < 10% What will GPT-4 be incapable of? re right prompt: GPT-3 has a context window of 2048 tokens, so this limits quite a lot what it could do. Also, it's not accurate at two-digit multiplication (what you would at least need to multiply your$ to %), even worse at 5-digit. So in this case, we're sure it can't do your taxes. And in the more general case, gwern wrote some debugging steps to check if the problem is GPT-3 or your prompt.

Now, for GPT-4, given they keep scaling the same way, it won't be possible to have accurate enough digit multiplication (like 4-5 digits, cf. this thread) but with... (read more)

What will GPT-4 be incapable of?

So physics understanding.

How do you think it would perform on simpler question closer to its training dataset, like "we throw a ball from a 500m building with no wind, and the same ball but with wind, which one hits the floor earlier" (on average, after 1000 questions).$? If this still does not seem plausible, what is something you would bet$100 2:1 but not 1:1 that it would not be able to do?

1Ericf7dWhat do you mean by "on average after 1000 questions"? Because that is the crux of my answer: GPT-4 won't be able to QA its own work for accuracy, or even relevance.
What will GPT-4 be incapable of?

Interesting. Apparently GPT-2 could make (up to?) 14 non-invalid moves. Also, this paper mentions a cross-entropy log-loss of 0.7 and make 10% of invalid moves after fine-tuning on 2.8M chess games. So maybe here data is the bottleneck, but assuming it's not, GPT-4's overall loss would be x smaller than GPT-2 (cf. Fig1 on parameters), and with the strong assumption of the overall transfering directly to chess loss, and chess invalid move accuracy being inversely proportional to chess loss wins, then it would make... (read more)

What will GPT-4 be incapable of?

So from 2-digit substraction to 5-digit substraction it lost 90% accuracy, and scaling the model by ~10x gave a 3x improvement (from 10 to 30%) on two-digit multiplication. So assuming we get 3x more accuracy from each 10x increase and that 100% on two digit corresponds to ~10% on 5-digit, we would need something like 3 more scalings like "13B -> 175B", so about 400 trillion params.

2HDMI Cable8dThat's fair. Depending on your stance on Moore's Law or supercomputers, 400 trillion parameters might or might not be plausible (not really IMO). But, this is assuming that there's no advances in the model architecture (maybe changes to the tokenizer?) which would drastically improve the performance of multiplication / other types of math.
What will GPT-4 be incapable of?

That's a good one. What would be a claim you would be less confident (less than 80%) about but still enough confident to bet \$100 at 2:1 odds? For me it would be "gpt-4 would beat a random go bot 99% of the time (in 1000 games) given the right input of less than1000 bytes."

2ChristianKl7dIt takes a human hundreds of hours to get to that level of play strength. Unless part of building GPT-4 involves letting it play various games against itself for practice I would be very surprised if GPT-4 could even beat an average Go bot (let's say one who plays at 5 kyu on KGS to be more specific) 50% of the time. I would put the confidence for that at something like 95% and most of the remaining percent is about OpenAI doing weird things to make GPT-4 specifically good for the task.
5Alexei8dFor me it would be: gpt4 would propose a legal next move to all 1000 random chess games.
What will GPT-4 be incapable of?

A model relased on openai.com with "GPT" in the name before end of 2022. Could be either GPTX where X is a new name for GPT4, but should be an iteration over GPT-3 and should have at least 10x more parameters.

An Informal Introduction to Solomonoff Induction and Pascal Mugging

(note to mods: Ideally I would prefer to have larger Latex equations, not sure how to do that. If someone could just make those bigger, or even replace the equation screenshot with real Latex that would be awesome.)

sure I agree that keeping your system predictions for you makes more sense and tweeting doesn't necessarily help. Maybe what I'm pointing at is where the text you're tweeting is not necessarily "predictions" but maybe some "manipulation text" to maximize profit short term. Let's say you tweet "buy dogecoin" like Elon Musk, so the price goes higher and you can sell all of your doge when you predicted the price would drop. I'm not really sure how such planning would work, and exactly what to feed to some NLP model to manipulate the market in such a way... bu... (read more)

yes that's 50 million dollars

2Daniel Kokotajlo21dHow much did they start with / invest? How many years did it take? Were they part of a larger group (e.g. quant firm)?

More generally, there's a difference between things being true and being useful. Believing that sometimes you should not update isn't a really useful habit as it forces the rationalizations you mentioned.

Another example is believing "willpower is a limited quantity" vs. "it's a muscle and the more I use it the stronger I get". The first belief will push you towards not doing anything, which is similar to the default mode of not updating in your story.

--It seems pretty likely that it will be for humans (something that works for mices wouldn't be impressive enough for an announcement). In last year's white paper they were already inserting electrode arrays in the brain. But maybe you mean something that lives inside the brain independently? (90%)

--If by "significative damage" you mean "not altering basic human capabilities" then it sounds plausible. From the white paper th... (read more)

4Daniel Kokotajlo8moI was thinking they'll probably show off a monkey with the device doing something. Last year they were talking about how they were working on getting it safe enough to put in humans without degrading quickly and/or causing damage IIRC; thus I'd be surprised if they already have it in a human, surely there hasn't been enough time to test its safety... IDK. Your credences seem reasonable.
Matt Botvinick on the spontaneous emergence of learning algorithms

Funnily enough, I wrote a blog distilling what I learned from reproducing experiments of that 2018 Nature paper, adding some animations and diagrams. I especially look at the two-step task, the Harlow task (the one with monkeys looking at a screen), and also try to explain some brain things (e.g. how DA interacts with the PFN) at the end.

OpenAI announces GPT-3

HN comment unsure about the meta-learning generalization claims that OpenAI has a "serious duty [...] to frame their results more carefully"

So, the paper title is "Language Models are Few-Shot Learners" and this commenter's suggested "more conservative interpretation" is "Lots of NLP Tasks are Learned in the Course of Language Modeling and can be Queried by Example." Now, I agree that version states the thesis more clearly, but it's pretty much saying the same thing. It's a claim about properties fundamental to language models, not about this specific model. I can't fully evaluate whether the authors have enough evidence to back that claim up but it's an interesting and plausible idea, and I don't think the framing is irresponsible if they really believe it's true.

Raemon's Shortform

re working memory: never thought of it during conversations, interesting. it seems that we sometime hold the nodes of the conversation tree to go back to them afterward. and maybe if you're introducing new concepts while you're talking people need to hold those definitions in working memory as well.

What would flourishing look like in Conway's Game of Life?

Some friends tried (inconclusively) to apply AlphaZero to a two-player GoL. I can put you in touch if you want their feedback.

1sudhanshu_kasewa1yThanks for the note. I'll let you know if my explorations take me that way.
Michaël Trazzi's Shortform

Thanks for the tutorial to download documentation, I've never done that myself so will check it out next time I go offline for a while!

I usually just run python to look at docs, importing the library, and then do help(lib.module.function). If I don't really know what the class can do, I usually do dir(class_instance) to find the available methods/attributes, and do the help thing on them.

This only works if you know reasonably well where to look at. If I were you I would try loading the "read the docs" html build offline in your browser (might be searchable this way), but then you still have a browser open (so you would really need to turn down wifi).

How to do remote co-working

Thanks for writing this up!

I've personally tried Complice coworking rooms where people synchronize on pomodoros and chat during breaks, especially EA France's study room (+discord to voice chat during breaks) but there's also a LW study hall: https://complice.co/rooms

1SoerenMind1yYes these are also great options. I used them in the past but somehow didn't keep it up. Co-working with a friend is good option for people like myself who benefit from having someone who expects me to be there (and who I'm socially comfortable with).
Michaël Trazzi's Shortform

I've been experimenting with offline coding recently, sharing some of my conclusions.

Why I started 1) Most of the programming I do at the moment only needs a terminal and a text editor. I'm implementing things from scratch without needing libraries and I noticed I could just read the docs offline. 2) I came to the conclusion that googling things wasn't worth the cost of having a web browser open--using the outside view, when I look back at all the instances of coding while having the internet in easy-access, I always end up being distracted,... (read more)

The Epistemology of AI risk

Thanks for all the references! I don't currently have much time to read all of it right now so I can't really engage with the specific arguments for the rejection of using utility functions/studying recursive self-improvement.

I essentially agree with most of what you wrote. There is maybe a slight disagreement in how you framed (not what you meant) how research focus shifted since 2014.

I see Superintelligence as essentially saying "hey, there is pb A. And even if we solve A, then we might also have B. And given C and D, there might be E.&qu... (read more)

The Epistemology of AI risk

This framing really helped me think about gradual self-improvement, thanks for writing it down!

I agree with most of what you wrote. I still feel that in the case of an AGI re-writing its own code there's some sense of intent that hasn't been explicitly happening for the past thousand years.

Agreed, you could still model Humanity as some kind of self-improving Human + Computer Colossus (cf. Tim Urban's framing) that somehow has some agency. But it's much less effective at self-improving itself, and it's not thinking "yep, I need... (read more)

The Epistemology of AI risk

I get the sense that the crux here is more between fast / slow takeoffs than unipolar / multipolar scenarios.

In the case of a gradual transition into more powerful technology, what happens when the children of your analogy discovers recursive self improvement?

2Matthew Barnett1yEven recursive self improvement can be framed gradually. Recursive technological improvement is thousands of years old. The phenomenon of technology allowing us to build better technology has sustained economic growth. Recursive self improvement is simply a very local form of recursive technological improvement. You could imagine systems will gradually get better at recursive self improvement. Some will improve themselves sort-of well, and these systems will pose risks. Some other systems will improve themselves really well, and pose greater risks. But we would have seen the latter phenomenon coming ahead of time. And since there's no hard separation between recursive technological improvement and recursive self improvement, you could imagine technological improvement getting gradually more local, until all the relevant action is from a single system improving itself. In that case, there would also be warning signs before it was too late.
The Epistemology of AI risk

When you say "the last few years has seen many people here" for your 2nd/3rd paragraph, do you have any posts / authors in mind to illustrate?

I agree that there has been a shift in what people write about because the field grew (as Daniel Filan pointed out). However, I don't remember reading anyone dismiss convergent instrumental goals such as increasing your own intelligence or utility functions as an useful abstraction to think about agency.

In your thread with ofer, he asked what was the difference between using loss functions in neural nets vs. objective function / utility functions and I haven't fully catched your opinion on that.

2Matthew Barnett1yFor the utility of talking about utility functions, see this rebuttal [https://www.lesswrong.com/s/4dHMdK5TLN6xcqtyc/p/NxF5G6CJiof6cemTw] of an argument justifying the use of utility functions by appealing to the VNM-utility theorem, and a few [https://www.lesswrong.com/posts/Q9JKKwSFybCTtMS9d/what-are-we-assuming-about-utility-functions] more [https://www.lesswrong.com/posts/4K52SS7fm9mp5rMdX/three-ways-that-sufficiently-optimized-agents-appear] posts [https://www.lesswrong.com/posts/vphFJzK3mWA4PJKAg/coherent-behaviour-in-the-real-world-is-an-incoherent] expanding the discussion. The CAIS paper [https://www.lesswrong.com/posts/x3fNwSe5aWZb5yXEG/reframing-superintelligence-comprehensive-ai-services-as] argues that we shouldn't model future AI as having monolithic long-term utility function. But it's by no means a settled debate. For the rejection of stable self improvement as a research priority, Paul Christiano wrote a post [https://www.lesswrong.com/posts/5bd75cc58225bf0670374e92/stable-self-improvement-as-a-research-problem] in 2014 where he argued that stable recursive self improvement will be solved a special case of reasoning under uncertainty. And again, the CAIS model proposes that technological progress will feed into itself (not unlike what already happens), rather than a monolithic agent improving itself. I get the impression that very few people outside of MIRI work on studying stable recursive self improvement, though this might be because they think it's not their comparative advantage. There's a difference between accepting something as a theoretical problem, and accepting that it's a tractable research priority. I was arguing that the type of work we do right now might not be useful for future researchers, and so I wasn't trying to say that these things didn't exist. Rather, it's not clear that productive work can be done on them right now. My evidence was that the way we think about these problems has changed over the years. Of course, you
The Epistemology of AI risk

the ones you mentioned

To be clear, this is a linkpost for Philip Trammell's blogpost. I'm not involved in the writing.

2Matthew Barnett1yApologies for the confusing language, I knew.
The Epistemology of AI risk

As you say

To be clear, the author is Philip Trammell, not me. Added quotes to make it clearer.

Ultra-simplified research agenda

Having printed and read the full version, this ultra-simplified version was an useful summary.

Happy to read a (not-so-)simplified version (like 20-30 paragraphs).

1Eli Tyre1yhahahahahah.
AI Alignment "Scaffolding" Project Ideas (Request for Advice)
A comprehensive AI alignment introductory web hub

RAISE and Robert Miles provide introductory content. You can think of LW->alignment forum as "web hubs" for AI Alignment research.

formal curriculum

There was a course on AGI Safety last fall in Berkeley.

A department or even a single outspokenly sympathetic official in any government of any industrialized nation

You can find a list of institutions/donors here.

A list of concrete and detailed policy proposals related to AI alignment

I would recommend reports from FHI/GovAI as a starting point.

Would th
Modeling AI milestones to adjust AGI arrival estimates?

You can find AGI predictions, including Starcraft forecasts, in "When Will AI Exceed Human Performance? Evidence from AI Experts". Projects for having "all forecasts on AGI in one place" include ai.metaculus.com & foretold.io.

3moridinamael2yLooks like all of the "games"-oriented predictions that were supposed to happen in the first 25 years have already happened within 3. edit: Misread the charts. It's more like the predictions within the first ~10 years have already been accomplished, plus or minus a few.
Problems with Counterfactual Oracles

1. Proposals should make superintelligences less likely to fight you by using some conceptual insight true in most cases.
2. With CIRL, this insight is "we want the AI to actively cooperate with humans", so there's real value from it being formalized in a paper.
3. In the counterfactual paper, there's the insight "what if the AI thinks he's not on but still learns".
For the last bit, I have two interpretations:
4.a. However, it's unclear that this design avoids all manipulative behaviour
2TurnTrout2yIt's more like 4a. The line of thinking seems useful, but I'm not sure that it lands.
Problems with Counterfactual Oracles

The zero reward is in the paper. I agree that skipping would solve the problem. From talking to Stuart, my impression is that he thinks that would be equivalent to skipping for specifying "no learning", or would just slow down learning. My disagreement on that I think it can confuse learning to the point of not learning the right thing.

Why not do a combination of pre-training and online learning, where you do enough during the training phase to get a useful predictor, and then use online learning to deal with subsequent distributional shifts?
Problems with Counterfactual Oracles

The string is read with probability 1-

Problems with Counterfactual Oracles

Yes, if we choose the utility function to make it a CDT agent optimizing for the reward for one step (so particular case of act-based) then it won't care about future versions of itself nor want to escape.

I agree with the intuition of shutting down to make it episodic, but I am still confused about the causal relationship between "having the rule to shutdown the system" and "having a current timestep maximizer". For it to really be a "current timestep maximizer" it needs to be in some kind of reward/utility function. Beca... (read more)

Problems with Counterfactual Oracles

The Asymptotically Unambitious AGI thread helped me clarify my thoughts, thanks. I agree that an optimal CDT agent won't think about future versions, and I don't see any optimization pressure towards escape message nor disproportionately common "escape message" regions.

However, it still assumes we have access to this magic oracle that optimizes for where is the event where humans don't see the answer, its indicator function, and the counterfactual reward (given by the automatic machine). If humans were able to build ... (read more)

2Wei_Dai2yWhy do we have to give the oracle a zero reward for the non-erasure episodes? Why not just skip the learning/update step for those episodes? Why not do a combination of pre-training and online learning, where you do enough during the training phase to get a useful predictor, and then use online learning to deal with subsequent distributional shifts? Although I guess that probably isn't really original either. What seems original is that during any episode where learning will take place, don't let humans (or any other system that might be insecure against the oracle) see the oracle's output until the episode is over.
1RyanCarey2yThe escape action being randomly called should not be a problem if it is a text string that is only read if r=1, and is ineffectual otherwise...
Stories of Continuous Deception

I agree that these stories won't (naturally) lead to a treacherous turn. Continuously learning to deceive (a ML failure in this case, as you mentioned) is a different result. The story/learning should be substantially different to lead to "learning the concept of deception" (for reaching an AGI-level ability to reason about such abstract concepts), but maybe there's a way to learn those concepts with only narrow AI.

I included dates such as 2020 to 2045 to make it more concrete. I agree that weeks (instead of years) would give a more accurate representation as current ML experiments take a few weeks tops.

The scenario I had in mind is "in the context of a few weeks ML experiment, I achieved human intelligence and realized that I need to conceal my intentions/capabilities and I still don't have decisive strategic advantage". The challenge would then be "how to conceal my human level intelligence before everything I have discovered is thrown away"... (read more)

A Treacherous Turn Timeline - Children, Seed AIs and Predicting AI

Your comment makes a lot os sense, thanks.

I put step 2. before step 3. because I thought something like "first you learn that there is some supervisor watching, and then you realize that you would prefer him not to watch". Agreed that step 2. could happen only by thinking.

Yep, deception is about alignment, and I think that most parents would be more concerned about alignment, not improving the tactics. However, I agree that if we take "education" in a broad sense (including high school, college, etc.), it's unofficially about tacti... (read more)

A Treacherous Turn Timeline - Children, Seed AIs and Predicting AI

I meant:

"In my opinion, the disagreement between Bostrom (treacherous turn) and Goertzel (sordid stumble) originates from the uncertainty about how long steps 2. and 3. will take"

That's an interesting scenario. Instead of "won't see a practical way to replace humanity with its tools", I would say "would estimate its chances of success to be < 99%". I agree that we could say that it's "honestly" making humans happy in the sense that it understands that this maximizes expected value. However, he knows... (read more)

4countingtoten2ySmiler AI: I'm focusing on self-improvement. A smarter, better version of me would find better ways to fill the world with smiles. Beyond that, it's silly for me to try predicting a superior intelligence.
5countingtoten2yMostly agree, but I think an AGI could be subhuman in various ways until it becomes vastly superhuman. I assume we agree that no real AI could consider literally every possible course of action when it comes to long-term plans. Therefore, a smiler could legitimately dismiss all thoughts of repurposing our atoms as an unprofitable line of inquiry, right up until it has the ability to kill us. (This could happen even without crude corrigibility measures, which we could remove or allow to be absent from a self-revision because we trust the AI.) It could look deceptively like human beings deciding not to pursue an Infinity Gauntlet to snap our problems away.

This thread is to discuss "How useful is quantilization for mitigating specification-gaming? (Ryan Carey, Apr. 2019, SafeML ICLR 2019 Workshop)"

This thread is to discuss "Quantilizers (Michaël Trazzi & Ryan Carey, Apr. 2019, Github)".

This thread is to discuss "When to use quantilization (Ryan Carey, Feb. 2019, LessWrong)"

This thread is to discuss "Quantilal control for finite MDPs & Computing an exact quantilal policy (Vanessa Kosoy, Apr. 2018, LessWrong)"

This thread is to discuss "Reinforcement Learning with a Corrupted Reward Channel (Tom Everitt; Victoria Krakovna; Laurent Orseau; Marcus Hutter; Shane Legg, Aug. 2017, arXiv; IJCAI)"

This thread is to discuss "Thoughts on Quantilizers (Stuart Armstrong, Jan. 2017, Intelligent Agent)"

This thread is to discuss "Another view of quantilizers: avoiding Goodhart's Law (Jessica Taylor, Jan. 2016, Intelligent Agent Foundations Forum)"