All of Søren Elverlin's Comments + Replies

I'll do both:

  1. I (again) affirm that this is very speculative.
  2. A substantial part of my private evidence is my personal evaluation of the CEO of OpenAI. I am really uneasy about stating this in public, but I now regret keeping my very negative evaluation of SBF private. Speak the truth, even if your voice trembles. I think a full "Heel turn" is more likely than not.

Bostroms definition of the control problem in 'Superintelligence' only refer to "harming the projects interests", which you are right is broader than existential risk. However, the immediate context makes it clear that Bostrom is discussing existential risk. The "harm" referred to does not include things like gender bias.

On reflection, I don't actually believe that AI Alignment has ever exclusively referred to existential risk from AI. I do believe that talk about "AI Alignment" on LessWrong has usually primarily been about existential risk. I further thin... (read more)

2paulfchristiano9d
The control problem is initially introduced as: "the problem of how to control what the superintelligence would do." In the chapter you reference it is presented as the principal agent problem that occurs between a human and the superintelligent AI they build (apparently the whole of that problem). It would be reasonable to say that there is no control problem for modern AI because Bostrom's usage of "the control problem" is exclusively about controlling superintelligence. On this definition either there is no control research today, or it comes back to the implicit controversial empirical claim about how some work is relevant and other work is not. If you are teaching GPT to better understand instructions I would also call that improving its capability (though some people would call it alignment, this is the de dicto vs de re distinction discussed here [https://ai-alignment.com/clarifying-ai-alignment-cec47cd69dd6]). If it already understands instructions and you are training it to follow them, I would call that alignment. I think you can use AI alignment however you want, but this is a lame thing to get angry at labs about and you should expect ongoing confusion.

I was unclear. Let me elaborate:

"AGI-Completeness" is the idea that a large class of tasks have the same difficulty, roughly analogous to "Turing-Completeness" and "NP-Completenes".

My claim in the post is that I doubt OpenAI's hope that the task "Alignment Research" will turn out to be strictly easier than any dangerous task.

My claim in my comment above refers to the relative difficulty of 2 tasks:

  1. Make a contribution to Alignment Reseach comparable to the contribution of the book 'Superintelligence'.
  2. Drive from NY to SF without human intervention except
... (read more)

I fully agree that it is a factual question, and OpenAI could easily shed light on the circumstances around the launch if they chose to do so.

Maybe the underlying reason why we are interpreting the evidence in different ways is because we are holding OpenAI to different standards:

Compared to a standard company, having a feedback button is evidence of competence. Quickly incorporating training data is also a positive update, as is having an explicit graphical representation of illegitimate questions.

I am comparing OpenAI to the extremely high standard of "Being able to solve the alignment problem". Against this standard, having a feedback button is absolutely expected, and even things like Eliezers suggestion (publishing hashes of your gambits) should be obvious to companies competent enough to have a chance of solving the alignment problem.

6ChristianKl10d
It's important to be able to distinguish factual questions from questions about judgments. "Did the OpenAI release happen the way OpenAI expected?" is a factual question that has nothing to do with the question of what standards we should have for OpenAI. If you get the factual questions wrong it's very easy for people within OpenAI to easily dismiss your arguments.

On reflection, I agree that it is only weak evidence. I agree we know nothing about damage. I agree that we have no evidence that this wasn't the planned strategy. Still, the evidence the other way (that this was deliberate to gather training data) is IMHO weaker.

My point in the "Review" section is that OpenAI's plan committed them to transparency about these questions, and yet we have to rely on speculations.

5ChristianKl10d
I find the fact that they used the training data in a short time to massively reduce the "jailbreak"-cases evidence in the direction that the point of the exercise was to gather training data. ChatGPT has a mode where it labels your question as illegitimate and colors it red but still gives you an answer. Then there's the feedback button to tell OpenAI if it made a mistake. This behavior prioritizes gathering training data over not giving any problematic answers.

Eliezer: OpenAI probably thought they were trying hard at precautions; but they didn't have anybody on their team who was really creative about breaking stuff, let alone anyone as creative as the combined Internet; so it got jailbroken in like a day after something smarter looked at it.

3Heighn10d
Seems to me Yudkowsky was (way) too pessimistic about OpenAI there. They probably knew something like this would happen.

I think this is very weak evidence. "Jailbreaking it" did as far as I know no damage. At least I haven't seen anybody point to any damage created. On the other hand, it did give OpenAI training data it could use to fix many of the holes. 

Even if you don't agree with that strategy, I see no evidence that this wasn't the planned strategy.

4lc10d
That's not even an assertion that it didn't go as they expected, let alone an explanation of why one would assume that.

I haven't seen a rigorous treatment of the concept of AGI-completeness. Here are some suggested AGI complete problems:

I don't have a solid answer, but I would be surprised if the task "Write the book 'Superintelligence'" required less general intelligence than "full self-driving from NY to SF".

4awg10d
I'm interested why you would think that writing "Superintelligence" would require less GI than full self-driving from NY to SF. The former seems like a pretty narrow task compared to the latter.

I would be excited to see Rational Animations try to cover the Hard Problem of Corrigibility: https://arbital.com/p/hard_corrigibility/

I believe that this would be the optimal video to create for the optimization target "reduce probability of AI-Doom". It seems (barely) plausible that someone really smart could watch the video, make a connection to some obscure subject none of us know about, and then produce a really impactful contribution to solving AI Alignment.

Talking concretely, what does a utility function look like that is so close to a human utility function that an AI system has it after a bunch of training, but which is an absolute disaster?

A simple example could be that the humans involved in the initial training are negative utilitarians. Once the AI is powerful enough, it would be able to implement omnicide rather than just curing diseases.

Thus in order to arrive at a conclusion of doom, it is not enough to argue that we cannot align AI perfectly.

I am open to being corrected, but I do not recall ever seeing a requirement of "perfect" alignment in the cases made for doom. Eliezer Yudkowsky in "AGI Ruin: A List of Lethalities" only asks for 'this will not kill literally everyone'.

2Jeff Rose3mo
My impression is that there has been a variety of suggestions about the necessary level of alignment. It is only recently that don't kill most of humanity has been suggested as a goal and I am not sure that the suggestion was meant to be taken seriously. (Because if you can do that, you can probably do much better; the point of that comment as I understand it was that we aren't even close to being able to achieve even that goal.)

Without investigating these empirical details, it is unclear whether a particular qualitatively identified force for goal-directedness will cause disaster within a particular time.

A sufficient criteria for a desire to cause catastrophe (distinct from having the means to cause catastrophe) is if the AI is sufficiently goal-directed to be influenced by Stephen Omohundro's "Basic AI Drives".

For instance, take an entity with a cycle of preferences, apples > bananas = oranges > pears > apples. The entity notices that it sometimes treats oranges as better than pears and sometimes worse. It tries to correct by adjusting the value of oranges to be the same as pears. The new utility function is exactly as incoherent as the old one.

It is possible that an AI will try to become more coherent and fail, but we are worried about the most capable AI and cannot rely on the hope that it will fail such a simple task. Being coherent is easy if the fruits are instrumental: Just look up the prices of the fruits.

However if we think that utility maximization is difficult to wield without great destruction, then that suggests a disincentive to creating systems with behavior closer to utility-maximization. Not just from the world being destroyed, but from the same dynamic causing more minor divergences from expectations, if the user can’t specify their own utility function well.

A strategically aware utility maximizer would try to figure out what your expectations are, satisfy them while preparing a take-over, and strike decisively without warning. We should not expect to see an intermediate level of "great destruction".

I prefer "AI Safety" over "AI Alignment" because I associate the first more with Corrigibility, and the second more with Value-alignment.

It is the term "Safe AI" that implies 0% risk, while "AI Safety" seems more similar to "Aircraft Safety" in acknowledging a non-zero risk.

7Rob Bensinger4mo
I agree that corrigibility, task AGI, etc. is a better thing for the field to focus on than value learning. This seems like a real cost of the term "AI alignment", especially insofar as researchers like Stuart Russell have introduced the term "value alignment" and used "alignment" as a shorthand for that.

The epistemic shadow argument further requires that the fast takeoff leads to something close to extinction.

This is not the least impressive thing I expect GPT-4 won't be able to do~.

I should have explained what I mean by "always (10/10)": If you generate 10 completions, you expect with 95% confidence that all 10 satisfies the criteria.

All the absolute statements in my post should be turned down from 100% to 99.5%. My intuition is that if less than 1 in 200 ideas are valuable, it will not be worthwhile to have the model contribute to improving itself.

Intelligence Amplification

GPT-4 will be unable to contribute to the core cognitive tasks involved in AI programming.

  • If you ask GPT-4 to generate ideas for how to improve itself, it will always (10/10) suggest things that an AI researcher considers very unlikely to work.
  • If you ask GPT-4 to evaluate ideas for improvement that are generated by an AI researcher, the feedback will be of no practical use.
  • Likewise, every suggestion for how to get more data or compute, or be more efficient with data or compute, will be judged by an AI researcher as hopeless.
  • I
... (read more)
3jacob_cannell5mo
I think you are probably overconfident mostly because of the use of the term 'every' in some of these clauses. Consider that if GPT-4 is trained on arxiv, it could plausibly make many many research suggestions. And all it would need to do in order to disprove the extremely generally worded clause 3 would be to eventually generate one such research suggestion that improves 'compute' (hardware or software efficiency), which eventually becomes a certainty with enough suggestions. So essentially you are betting that GPT-4 is not trained on arxiv.
9Lone Pine5mo
There's a bit of an epistemic shadow here. If a capability is publicly announced and available, then it can't be the keystone to a fast takeoff.

The Gato paper from DeepMind actually shows, if you look at their data, that they’re still getting better transfer effects if you train in domain than if you train across all possible tasks.

This probably refers to figure 9 in A Generalist Agent, which compares generalization given:

  1. Training in irrelevant domain (Blue line)
  2. Training in relevant domain (Green line)
  3. Training in both domains (Yellow line)

From DeepMind's results in the figure, it looks like 3. almost always outperforms 2., though I would hesitate to draw strong conclusions from this figur... (read more)

In-universe, Mecha-Godzilla had to be built with a Godzilla-skeleton, which caused both to turn against Humanity.

It feels probable that there will be substantial technical similarities between Production Superintelligences and Alignment Superintelligences, which could cause both of them to turn against us.
(Epistemic state: Low confidence)

The inclusion criteria states:

Tasks that are completely beyond the capabilities of current language models are also encouraged

It's easy to come up with a benchmark that requires a high but unspecified level of intelligence. An extreme example would be to ask for a proof that P!=NP - we have no idea about the difficulty of the task, though we suspect that it requires superintelligence. To be valuable, the challenge of a benchmark needs to possible to relate to meaningful capabilities, such as "The Human Level".

Most people couldn't answer questions about... (read more)

9RomanS8mo
You're right. And some of the existing tasks in the benchmark are way beyond the abilities of baseline humans (e.g. the image classification task where images are hex-encoded texts). On the other hand, the organizers allowed the human testers to use any tool they want, including internet search, software etc. So, the measured top-human performance is the performance of humans augmented with technology. I think an AI that can solve BIG-bench must be an AGI. But there could be an AGI that can't solve BIG-bench yet.

Thank you for a very thought-provoking post.

My layman understanding of the VDV is that their goals are primarily political ("Anti-coup") and meant for rapid deployment to counter uprisings etc. rather than maximizing military effectiveness. This reflects how they were used in Ukraine - contrary to their expectations, this was a real war and not an uprising.

Giving disproportionate ressources to "Republican Guard" units seem like a common pattern in authoritarian countries.

3Davis_Kingsley9mo
I think it's somewhat complicated -- the VDV is also used in conventional operations thanks to its elite and volunteer status (see for instance this primer on Russian military methods [https://www.rand.org/content/dam/rand/pubs/perspectives/PE200/PE231/RAND_PE231.pdf] ), which makes them more reliable and effective than conscript forces even in some more "conventional" tasks. In some ways this might be considered similar to the structure of the post-WWII French military, where the paratroopers and the Foreign Legion were made up of volunteers and used preferentially over conscript forces -- indeed, as I understand it France did not use conscripts at all in the Indochina War [https://en.wikipedia.org/wiki/First_Indochina_War], and favored using its "more reliable" volunteer units in the Algerian War, [https://en.wikipedia.org/wiki/Algerian_War] with the infamous Battle of Algiers [https://en.wikipedia.org/wiki/Battle_of_Algiers_(1956%E2%80%931957)] conducted primarily by paratroopers. (Ironically, the reliability of these units in combat did not mean political reliability -- when the French government eventually decided to grant Algerian independence, some of the paratroopers joined a coup attempt! [https://en.wikipedia.org/wiki/Algiers_putsch_of_1961]) At the same time though, Russia has invested substantially in technological capabilities for its airborne forces to assist in their primary airborne mission, with things like the BMD- and BTR- series of airborne APCs/IFVs, multi-canopy and rocket-assisted parachutes to allow these vehicles to be dropped (in some cases with crews inside!), and so on.

I think we are close to agreeing with each other on how we expect the future to look. I certainly agree that real world impact is discontinuous in metrics, though I would blame practical matters rather than poor metrics.

I only have a vague idea what is meant by language models contributing to GDP.

Current language models are actually quite reliable when you give them easy questions. Practical deployment of language models are sometimes held to very high standards of reliability and lack of bias, possibly for regulatory, social or other practical reasons. Yet I personally know someone who works in customer service and is somewhat racist and not very reliable.

I am not sure I understand your counterbet. I would guess most translation is already automated, most programmers use automated tools already and most Internet "journalism" is already computer generated.

I claim that most coordination-tasks (defined very broadly) in our civilization could be done by language models talking to each other, if we could overcome the enormous obstacle of getting all relevant information into the prompts and transferring the completions to "the real world".

I agree with this in principle, but in practice I think current language models are much too bad for this to be on the cards.

Assume PaLM magically improved to perform 2 standard deviations above the human average. In my model, this would have a very slow effect on GDP.... (read more)

2Ege Erdil10mo
2 standard deviations above the human average with respect to what metric? My whole point is that the metrics people look at in ML papers are not necessarily relevant in the real world and/or the real world impact (say, in revenue generated by the models) is a discontinuous function of these metrics. I would guess that 2 standard deviations above human average on commonly used language modeling benchmarks is still far from enough for even 10% of coordination tasks, though by this point models could well be generating plenty of revenue.

Thank you for this important caveat. As an imperfect bayesian, I expect that if I analyzed the benchmark, I would update towards a belief that the results are real, but less impressive than the article makes them appear.

:)

Assume that as a consequence of being in the Paul-verse, regulatory and other practical obstacles are possible to overcome in a very cost-effective way. In this world, how much value does current language models create?

I would answer that in this obstacle-free world, they create about 10% of global GDP and this share would be rapidly increasing. This is because a large set of valuable tasks are both simple enough that models could understand them, and possible to transform into a prompt completion task.

The argument is meant as a reductio: Language models d... (read more)

2Ege Erdil10mo
I don't agree with that at all. I think in this counterfactual world current language models would create about as much value as they create now, maybe higher by some factor but most likely not by an order of magnitude or more. I know this is what your argument is. For me the conclusion implied by "language models don't create value in our world" is "language models are not capable of creating value in our world & we're not capable of using them to create value", not that "the practical obstacles are hard to overcome". Also, this last claim about "practical obstacles" is very vague: if you can't currently buy a cheap ticket to Mars, is that a problem with "practical obstacles being difficult to overcome" or not? In some sense there's likely a billion dollar company idea which would build on existing language models, so if someone thought of the idea and had the right group of people to implement it they could be generating a lot of revenue. This would look very different from language models creating 10% of GDP, however. I agree with this in principle, but in practice I think current language models are much too bad for this to be on the cards. I'll be happy to claim victory when AGI is here and we're not all dead.
3Dirichlet-to-Neumann10mo
What exactly do you mean by "create 10% of global GDP" ? And why would you expect the current quite unreliable language models to have such a drastic effect ? Anyway I will counterbet that by 2032 most translation will be automated (90%) most programmers will use automated tools dayly (70%) most top level mathematics journals will use proof-checking software as part of their reviewing process (80%) and computer generated articles will make up a majority of Internet "journalism" (50%).

I struggle to understand your first sentence. Do you cash out "Useful" as "Having the theoretical ability to do a task"? As in: "If an AI benchmarks better than humans at a task, but don't generate revenue, the reason must be that the AI is not actually capable of doing the task".

In the Paul-verse, how does AI contribute substantially to GDP at AI capability levels between "Average Human" and "Superintelligence"?

It seems (to me) that the reasons are practical issues, inertia, regulatory, bureaucracy, conservatism etc., and not "Lack of AI Capability". As a... (read more)

8jeff876510mo
I think the issue here is that the tasks in question don't fully capture everything we care about in terms of language facility. I think this is largely because even very low probabilities of catastrophic actions can preclude deployment in an economically useful way. For example, a prime use of a language model would be to replace customer service representative. However, if there is even a one in a million chance that your model will start cursing out a customer, offer a customer a million dollars to remedy an error, or start spewing racial epithets, the model cannot be usefully deployed in such a fashion. None of the metrics in the paper can guarantee, or even suggest, that level of consistency.
2Ege Erdil10mo
No, I mean that being able to do the task cheaply and at a high quality is simply not that valuable. AI went from being uncompetitive against professional Go players on top-notch hardware to being able to beat them running on a GPU you can buy for less than $100, but the consumer surplus that's been created by this is very small. If AI is already as capable as an average human then you're really not far off from the singularity, in the sense that gross world product growth will explode within a short time and I don't know what happens afterwards. My own opinion (may not be shared by Paul) is that you can actually get to the singularity even with AI that's much worse than humans just because AI is so much easier to produce en masse and to improve at the tasks it can perform. I'll have an essay coming out about takeoff speeds on Metaculus in less than ten days (will also be crossposted to LessWrong) so I'll elaborate more on why I think this way there. Why do you think being above the human average on all language benchmarks is something that should cash out in the form of a big of consumer surplus? I think we agree that this is not true for playing Go or recognizing pictures of cats or generating impressive-looking original art, so what is the difference when it comes to being better at predicting the next word in a sentence or at solving logic puzzles given in verbal format? Of course there might not be time, but I'm happy to take you up on a bet (a symbolic one if actual settlement in the event of a singularity is meaningless) at even odds if you think this is more likely than the alternative.

According to this image, the performance is generally above the human average: image from article

In the Paul-verse, we should expect that economic interests would quickly cause such models to be used for everything that they can be profitably used for. With better-than-average-human performance, that may well be a doubling of global GDP.

In the Eliezer-verse, the impact of such models on the GDP of the world will remain around $0, due to practical and regulatory constraints, right up until the upper line ("Human (Best)") is surpassed for 1 particular task.

The BIG-Bench paper that those 'human' numbers are coming from (unpublished, quasi-public as TeX here) cautions against taking those average very seriously, without giving complete details about who the humans are or how they were asked/incentivized to behave on tasks that required specialized skills:

My take as someone who thinks along similar lines to Paul is that in the Paul-verse, if these models aren't being used to generate a lot of customer revenue then they are actually not very useful even if some abstract metric you came up with says they do better than humans on average.

It may even be that your metric is right and the model outperforms humans on a specific task, but AI has been outperforming humans on some tasks for a very long time now. It's just not easy to find profitable uses for most of those tasks, in the sense that the total consumer surplus generated by being able to perform them cheaply and at a high quality is low.

It is possible that Putins political goals involve dismantling Ukraine along with the complete subjugation of the Ukrainian people. Nuclear weapons could thus have a desired political effect, in addition to their substantial practical effects.

As a first approximation, if the food production falls to a level of X% of the required calories for the population, your probability of surviving is roughly X%.

Even a full counter-value nuclear exchange would not destroy all of our ability to produce food. Cities would be the primary targets, and they are net importers of food. Civilization might not even collapse with the removal of the 3000 largest cities in the western world.

1andrew sauer10mo
As a first approximation, if the food production falls to a level of X% of the required calories for the population, your probability of surviving is roughly X%. I mean, today food production is significantly more than 100% of the required calories for the population and many people are still food insecure
6Davidmanheim10mo
...but as a second approximation, post-large-scale nuclear war, if there is only 50% as much food as is required to feed everyone in a major city over a couple months, the ensuing violence and hoarding of the food will likely kill many more people than half. And I'm less sanguine than you about the survival of infrastructure needed to maintain the modern world without major cities, ports, power infrastructure, etc. Sure, there might be enough food overall, but if it's in a different place than you, and there is no modern communications or transport, that doesn't matter much. If you already live on a farm, great, but otherwise you're likely to be in trouble.

I have a higher probability that Putin will launch the first nuke at Kiev. I think he might think that all other scenarios end with a Russian defeat and his personal untimely death. Russia is already a pariah state, and there is comparatively little for him to lose. Nuking Kiev would have the side-effect of making civilians flee urban centers in Ukraine, dramatically increasing the probability of a conventional Russian victory.

2rhollerith_dot_com10mo
I think Poland is a more likely target because Putin probably still has hopes that Russia and Ukraine will unite some day and (correctly IMHO) anticipates that any use of nukes on Ukrainian territory diminishes that hope whereas he probably has given up on Poland's becoming friendly with Russia. Also, many Russians have family in Ukraine. The reason for attacking Poland would be to reduce the flow of supplies from Poland to the Ukrainian regime, both "directly" by destroying a road or railroad (maybe at a mountain pass or a bridge) or a logistics center in Poland and by deterring the Polish government (and the governments of the 3 other nations on Ukraine's western border) from continuing to supply Kyiv (out of fear of more nuclear strikes). I get the impression that the Russian navy has been preventing ships from supplying Ukraine via the Black Sea to any significant extent. If that is not true, then that would greatly reduce the usefulness of nuking Poland to Russia and consequently would greatly reduce its probability.

Absolutely fascinating link - strong upvote! Han et al.,2013 did not investigate motor control and motion planning, but I agree that human neural cells probably are just better, though possibly requiring more energy.

From martial arts, I'm convinced people have different innate levels of motor control and motion planning, and this helps nontrivially in fights. However, brains and muscles both require energy, and I'd generally give the advantage to the person with +1std muscles over the person with +1std motor control, assuming both are untrained.

no, I think a human in a big animal body, with brain adapted to operate that body instead of our own, would beat a big animal straightforwardly

According to https://en.wikipedia.org/wiki/File:Brain-body_mass_ratio_for_some_animals_diagram.svg the hippopotamus would be the animal that "gained" the most from having a human brain, assuming that brain-body mass ratio indicates intelligence.

I could see the Ronaldo-brained hippopotamus winning by planning, cooperating and learning more from experience, but I'm not seeing a big advantage in a straight-up fight.

I think human neural cells are 'just better'; there's some evidence in mice to this effect:

We found that the human glial chimeras indeed performed better than control mice across a variety of learning tasks, that included auditory fear conditioning, novel object and place recognition, and Barnes maze navigation. In all of these tests - but not in any test of social interactivity or primary perception - the human glial chimeras performed better and acquired new causal associations more quickly than did murine-allografted or untransplanted controls (Han et a

... (read more)

Paul Christiano makes a slightly different claim here: https://www.lesswrong.com/posts/7MCqRnZzvszsxgtJi/christiano-cotra-and-yudkowsky-on-ai-progress?commentId=AiNd3hZsKbajTDG2J

As I read the two claims:

  • With GPT-3 + 5 years of effort, a system could be built that would eventually Foom if allowed.
  • With GPT-3 + a serious effort, a system could be built that would clearly Foom if allowed.

I think the second could be made into a bet. I tried to operationalise it as a reply to the linked comment.

How long time do you see between "1 AI clearly on track to Foom" and "First AI to actually Foom"? My weak guess is Eliezer would say "Probably quite little time", but your model of the world requires the GWP to double over a 4 year period, and I'm guessing that period probably starts later than 2026.

I would be surprised if by 2027, I could point to an AI that for a full year had been on track to Foom, without Foom happening.

7paulfchristiano1y
I think "on track to foom" is a very long way before "actually fooms."

Yes, we are still running, though at a bi-weekly schedule. We will discuss Paul Christiano's "Another (Outer) Alignment failure story" on the 8th of July.

I made my most strident and impolite presentation yet in the AISafety.com Reading Group last night. We were discussing "Conversation with Ernie Davis", and I attacked this part:

"And once an AI has common sense it will realize that there’s no point in turning the world into paperclips..."

I described this as fundamentally mistaken and like an argument you'd hear from a person that had not read "Superintelligence". This is ad hominem, and it pains me. However, I feel like the emperor has no clothes, and calling it out explicitly is important.

3Viliam2y
Explaining things across long inferential distance is frustrating. The norm that arguments should be opposed by arguments (instead of e.g. ad hominems) is good in general, but sometimes a solid argument simply cannot be constructed in five minutes. At least you have pointed towards an answer...

Thank you for your answer, and good luck with the Alignment Research Center.

In the interview with AI Impacts, you said:

...examples of things that I’m optimistic about that they [people at MIRI] are super pessimistic about are like, stuff that looks more like verification...

Are you still optimistic? What do you consider the most promising recent work?

I don't think my view has changed too much (I don't work in the area so don't pay as much attention or think about it as often as I might like).

The main updates have been:

  • At the time of that interview I think it was public that Interval Bound Propagation was competitive with other verification methods for perturbation robustness, but I wasn't aware of that and definitely hadn't reflected on it. I think this makes other verification schemes seem somewhat less impressive / it's less likely they are addressing the hard parts of the problem we ultimately need
... (read more)

Today, I bought 20 shares in Gamestop / GME. I expect to lose money, and bought them as a hard-to-fake signal about willingness to coordinate and cooperate in the game-theoretic sense. This was inspired by Eliezer Yudkowsky's post here: https://yudkowsky.medium.com/

In theory, Moloch should take all the ressources of someone following this strategy. In practice, Eru looks after her own, so I have the money to spare.

3abramdemski2y
Is this still a short squeeze? (Have ~all of the shorts already been squeezed?)
3habryka2y
Should be fixed within half an hour! Sorry about forgetting about Denmark! (not including it was just a typo)

The AISafety.com Reading Group discussed this blog post when it was posted. There is a fair bit of commentary here: https://youtu.be/7ogJuXNmAIw

Hi Howie,

Thank you for reminding me of these four documents. I had seen them, but I dismissed them early in the process. That might have been a mistake, and I'll read them carefully now.

I think you did a great job at the interview. I describe one place where you could have pushed back more here: https://youtu.be/_kNvExbheNA?t=1376 You asked: "...Assume that among the things that these narrow AIs are really good at doing, one of them is programming AI...", and Ben Garfinkel made a broad answer about "doing science".

1Howie Lempel2y
On the documents: Unfortunately I read them nearly a year ago so my memory's hazy. But (3) goes over most of the main arguments we talked about in the podcast step by step, though it's just slides so you may have similar complaints about the lack of close analysis of the original texts. (1) is a pretty detailed write up of Ben's thoughts on discontinuities, sudden emergence, and explosive aftermath. To the extent that you were concerned about those bits in particular, I'd guess you'll find what you're looking for there.
1Howie Lempel2y
Thanks! Agree that it'd would've been useful to push on that point some more. I know Ben was writing up some additional parts of his argument at some point but I don't know whether finishing that up is still something he's working on.

Eric Drexler requested that I did not upload a recording to YouTube. Before the session, I compiled this document with most of the questions:

https://www.dropbox.com/s/i5oqix83wsfv1u5/Comprehensive_AI_Services_Q_A.pptx?dl=0

We did not get to post the last few questions. Are there any questions from this list you would like me to try to remember the answers to?

6Wei_Dai3y
Do you have a recording of the session? If so, can you send it to me via PM or email? I'm interested in answers to pretty much all of the questions. If no recording is available, any chance you could write up as many answers as you can remember? (If not, I'll try harder to narrow down my interest. :) I'm also curious why Eric Drexler didn't want you to upload a recording to YouTube. If the answers contain info hazards, it seems like writing up the answers publicly would be bad too. If not, what could outweigh the obvious positive value of releasing the recording? If he's worried about something like not necessarily endorsing the answers that he gave on the spot, maybe someone could prepare a transcript of the session for him to edit and then post?
4NaiveTortoise3y
I'm very interested in his responses to the following questions: 1. The question addressing Gwern's post about Tool AIs wanting to be Agent AIs. 2. The question addressing his optimism about progress without theoretical breakthroughs (related to NNs/DL).

Wikipedia claims that "it is faster in cases where n > 100 or so" https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm
The introduction of this Wikipedia article seems to describe these improvements as practically useful.

In my video, I describe one of the breakthroughs in matrix multplication after Strassen as "Efficient parallelization, like MapReduce, in the nineties". This insight is used in practice, though some of the other improvements I mention are not practical.

In the section "Finding the secret sauce", you... (read more)

Load More