# Recommendations

Predictably Wrong
Argument and Analysis
The Methods of Rationality

# Recent Discussion

A new paper from Google, in which they get a language model to solve some (of what to me reads as terrifyingly impressive) tasks which require quantitative reasoning skills. The abstract reads as follows:

Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva , a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology,

...

They test on the basic (Poziom podstawowy) Matura tier for testing on math problems.
In countries with Matura-based education, the basic tier math test is not usually taken by mathematically inclined students -- it is just the law that anyone going to a public university has to pass some sort of math exam beforehand. Students who want to study anything where mathematics skills are needed would take the higher tier (Poziom rozszezony).
Can someone from Poland confirm this?

A quick estimate of the percentage of high-school students taking the Polish Matura exam... (read more)

1YimbyGeorge4h
Where can I access and play around with this model and/or its code ?
15IL6h
The previous SOTA for MATH (https://arxiv.org/pdf/2009.03300.pdf) is a fine-tuned GPT-2 (1.5b params), whereas the previous SOTA for GSM8K (https://arxiv.org/pdf/2203.11171.pdf) is PaLM (540b params), using a similar "majority voting" method as Minerva (query each question ~40 times, take the most common answer).
1Conor Sullivan8h
What is expert level on competition math problems? Do undergrads regularly get half right? EDIT: someone answered elsewhere in the comments. Looks like this model is still well behind an expert human.

Mostly non-serious and slightly silly, with some potentially interesting bits for people who are into language models.

TLDR: The current version of GPT-3 has a strong tendency to encode mangled versions of a specific phrase when asked to write morse code in zero-shot situations. This is possibly the result of a previous version of the model using essentially a single phrase for all morse code writing, which the newer version then learnt to modify.

All completions done with text-davinci-002 (~GPT-Instruct-175B) at zero temperature and with no examples unless stated otherwise. All models used are GPT-Instruct series.

# The Basics

GPT-3 'knows' morse code in a rudimentary sense. It can accurately regurgitate both the encodings of the entire alphabet and of individual letters, but it's not so great at translating words:

Morse code is...

1Dirichlet-to-Neumann2h
You mean it can output a correct program that does the translation, but not translate itself ? That's even weirder.
2gjm31m
I don't think it's so very weird. Argument 1: "In order to write a program to do a thing, you must yourself understand how to do the thing." Objection 1a: Not very true. Many not-terribly-good programmers write code that kinda-works by cobbling together things they find on the internet. I think GPT-3 does something fairly similar. Which, to be clear, is still impressive! Most humans cannot write often-kinda-working software by cobbling things together from the internet! But it is absolutely not the case that no one can write working code to do something without understanding how it works. Objection 1b: I can write a program that calculates pi to 100 decimal places in a reasonable amount of time, but I cannot myself calculate pi to 100 decimal places without (with high probability) making mistakes along the way. (Well, as it happens I know pi to 100 decimal places, or at least have done in the past, so if that counts as "calculating" then I guess I can, but it shouldn't.) Argument 2: "If you can write a program to do a thing, then having written it you can execute it in your head and see what the result is." Objection 2a: Not very true. Many not-terribly-good programmers are surprisingly bad at executing programs in their heads. And GPT-3, in particular, is literally unable to do more than a fixed amount of computation per token it outputs. (It might be interesting to try to make it run a program in its head and make notes as it goes, which might let it get around that limitation, but I think the finite input window would then be a problem.) Objection 2b: Again, I can write a program that computes pi to 100 decimal places but I cannot execute it in my head. I would at the very least need a substantial amount of paper to make notes on. (If there's some other reason why it's weird for GPT-3 to be able to write a correct program to do a thing but not able to do the thing itself, I'm missing it.)

I tried a bit of handholding for simple program simulation, as follows:

[--- prompt begins ---]

Consider this function written in Python.

def f(n):

if n <= 1: return n

else: return f(n-1) + f(n-2)

What is the value of f(5)?

Since 5 <= 1 is false, f(5) equals f(4) + f(3), so we need to know the values of those.

Since 4 <= 1 is false, f(4) equals f(3) + f(2), so we also need to know f(2).

Since 3 <= 1 is false, f(3) equals f(2) + f(1) = f(2) + 1.

Since 2 <= 1 is false, f(2) equals f(1) + f(0) = 1 + 0 = 1.

So now we can ... (read more)

1Megan Kinniment2h
Yep, GPT is usually pretty good at picking up on patterns within prompts. You can also get it to do small ceaser shifts of short words with similar hand holding.

‘I don’t feel emotionally motivated to work on AI safety, even though I’m intellectually convinced that it’s important.’

It always surprises me when people say this because I find my work at Nonlinear on AI safety incredibly motivating. I’m sharing my reasons in the hope that they’ll resonate with some of you, and that these ideas will help bring your emotional drives into greater harmony with your abstract convictions.

# 1. The dramatic scale of AGI is inspiring

When I was a kid, I wanted to save the world. Like many EAs, I was obsessed with stories of superheroes who could use their powers to save whole cities from catastrophe. I aspired to be like Gandhi, or Martin Luther King, and to do something really big and important; something that would...

2Viliam16h
Is there a good software that would record your voice and convert it to text?

Otter (a smartphone app) is very good. So I've started using it recently for taking notes. Haven't tried using it to write an extended post about anything, though it could be a useful way of getting a first draft.

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the new Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

1mikbp2h
I know there are posts in LW that mention a behaviour and/or productivity tip of the form "if/when X happens, do Y". I don't know how this is called so I am not able to find any. Could anybody point me to the right direction, please?
4Kaj_Sotala1h
Trigger-Action Planning [https://www.lesswrong.com/tag/trigger-action-planning] ("implementation intentions" in the academic literature)

Awesome, thanks Kaj!

2Oscar_Cunningham4h
Is there a way to alter the structure of a futarchy to make it follow a decision theory other than EDT?

Many years ago, a blogger made a post advocating for an evil Y-Combinator which subsidized the opposite of Effective Altruism. Everyone (including the blogger) thought the post was a joke except the supervillains. The organization they founded celebrated its 10th anniversary this year. An attendee leaked to me a partial transcript from one of its board meetings.

Director: Historically, public unhealth has caused the most harm per dollar invested. How is is the Center for Disease Proliferation doing?

CDP Division Chief: Gain-of-function research remains—in principle—incredibly cheap. All you have to do is infect ferrets with the flu and let them spread it to one another. We focus on maximizing transmission first and then, once we have a highly-transmissible disease, select for lethality (ideally after a long asymptomatic infectious period).

CFO:...

Typos:
How is is the Center for Disease Proliferation doing?
Did the CDP have anything to do with COVID-19?
Building solar power plants is cheaper than building coal power plants.

I've argued that the development of advanced AI could make this the most important century for humanity. A common reaction to this idea is one laid out by Tyler Cowen here: "how good were past thinkers at predicting the future? Don’t just select on those who are famous because they got some big things right."

This is a common reason people give for being skeptical about the most important century - and, often, for skepticism about pretty much any attempt at futurism (trying to predict key events in the world a long time from now) or steering (trying to help the world navigate such key future events).

The idea is something like: "Even if we can't identify a particular weakness in arguments about key future events, perhaps we...

Thanks for another thought provoking post. This is quite timely for me, as I've been thinking a lot about the difference between the work of futurists as compared to forecasters.

These are people who thought a lot about science and the future, and made lots of predictions about future technologies - but they're famous for how entertaining their fiction was at the time, not how good their nonfiction predictions look in hindsight. I selected them by vaguely remembering that "the Big Three of science fiction" is a thing people say sometimes, googling it,

1Bezzi3h
Asimov may not have been a professional forecaster, but he was still someone who had thought a lot about the future in the most realistic way possible (and he got invited quite often on TV to talk about it, if I remember correctly), especially considering that he wrote also a crazy amount of scientific nonfiction [https://www.goodreads.com/shelf/show/nonfiction-asimov]. Maybe he's more famous as a science fiction author, but he was also a very well-known futurologist, not just some random smart guy who happened to make some predictions. I would be quite surprised to hear about anyone else from the 60s with a better futurology record than him. That said, I am still quite convinced that the average smart person would still make terrible predictions about the long-term future. The best example I can offer is this [https://rarehistoricalphotos.com/french-postcards-futuristic-world-year-2000/], one of the rare set of illustrations that got printed in 1899 France to imagine what France would look like in the year 2000. Of course, the vast majority of these predictions were comically bad. It is worth to notice that we mainly know about these postcards because Asimov himself published a book about them in the 80s (this is not a coincidende because nothing is ever a coincidence).
14simon7h
There's a lot of room for debate on the correctness of the resolutions of these predictions: e.g. Heinlein in 1949: This is marked as incorrect, due to the marker assuming that this meant mass space travel, but I wouldn't interpret this as mass space travel unless there's some relevant context I'm missing here - keep in mind that this was from 1949, 8 years before Sputnik.[1] [#fnj98w5zi4ehk] On the other hand: This is marked as correct, apparently due to autopilot and the "USAF Airborne Command Post"? But I would interpret it as active control of the planes by a centralized computer and mark it as incorrect.[2] [#fn2evfigbp5yy] Edited to add: there were a bunch i could have mentioned but want to remark on this one where my interpretation was especially different from the marker's: This is also from 1949. The marker interprets this as a prediction of "Commercial interplanetary travel". I see it rather as a conditional prediction of interplanetary travel (not necessarily commercial), given the willingness to fund it, i.e. a prediction that the necessary technology would be available but not necessarily that it would be funded. If this is the right interpretation, it seems correct to me. Again, I could be completely wrong depending on the context.[3] [#fn1k8p313b70n] 1. ^ [#fnrefj98w5zi4ehk]Edited to add: I realized I actually have a copy of Heinlein's "Expanded Universe" which includes "Where To?" and followup 1965 and 1980 comments. In context, this statement comes right in the middle of a discussion of hospitals for old people on the moon, which considerably shifts the interpretation towards it being intended to refer to mass space travel, though if Heinlein were still here he could argue it literally meant any space travel. 2. ^ [#fnref2evfigbp5yy]In context, it's not 100% clear that he meant a single computer, though I still think so. But he definitely meant full automation outside of emergency or unusual situati
27johnswentworth14h
My guess would be these measures result in predictions somewhat worse than the Big Three. If you want a reference class for "more serious" forecasting, I'd say go look for forecasts by fancy consulting agencies or thinktanks. My guess would be that they do somewhat worse, mainly because their authors are optimizing to Look Respectable rather than just optimizing purely for accuracy. And the AI researcher surveys and OpenPhil reports also sure do look like they're optimizing a significant amount for Looking Respectable.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Subscribe to Curated posts
...or continue with

In southern California there’s a two-acre butterfly preserve owned by the oil company Chevron. They spend little to maintain it, but many millions on television advertisements featuring it as evidence of their environmental stewardship.[1]

Environmentalists have a word for behavior like this: greenwashing. Greenwashing is when companies misleadingly portray themselves, or their products, as more environmentally-friendly than they are.

Greenwashing often does cause real environmental benefit. Take the signs in hotels discouraging you from washing your towels:

My guess is that the net environmental effect of these signs is in fact mildly positive. And while the most central examples of greenwashing involve deception, I’m sure some of these signs are put up by people who earnestly care. But I suspect hotels might tend to care less about water waste if utilities...

A tongue-in-cheek suggestion for noticing this phenomena: when you encounter professions of concern about alignment, ask yourself whether it seems like the person making those claims is hoping you’ll react like the marine mammals in this DuPont advertisement, dancing to Beethoven’s “Ode to Joy” about the release of double-hulled oil tankers.

From time to time, someone makes the case for why transparency in reasoning is important. The latest conceptualization is Epistemic Legibility by Elizabeth, but the core concept is similar to reasoning transparency used by OpenPhil, and also has some similarity to A Sketch of Good Communication by Ben Pace.

I'd like to offer a gentle pushback. The tl;dr is in my comment on Ben's post, but it seems useful enough for a standalone post.

How odd I can have all this inside me and to you it's just words.” ― David Foster Wallace

### When and why reasoning legibility is hard

Say you demand transparent reasoning from AlphaGo. The algorithm has roughly two parts: tree search and a neural network. Tree search reasoning is naturally legible: the "argument" is simply a sequence of board states. In contrast,...

I don't think the intuition "both are huge" so "~ roughly equal" is correct.

Tree search is decomposable into specific sequence of a board states, which are easily readable; in practice trees are pruned, and can be pruned to human-readable sizes.

This isn't true for the neural net. If you decompose the information in AlphaGo net into a huge list of arithmetic, if the "arithmetic" is the whole training process, the list is much larger than in the first case. If it's just the trained net, it's less interpretable than the tree.

1David Johnston14h
I don't think the analogy is great, because Go grandmasters have actually played, lost and (critically) won a great many games of Go. This has two implications: first, I can easily check their claims of expertise. Second, they have had many chances to improve their gut level understanding of how to play the game of Go well, and this kind of thing seems to be to necessary to develop expertise. How does one go about checking gut level intuitions about AI safety? It seems to me that turning gut intuitions into legible arguments that you and others can (relatively) easily check is one of the few tools we have, with objectively assessable predictions being another. Sure, both are hard, and it would be nice if we had easier ways to do it, but it seems to me that that's just how it is.

TL;DR: In this project, we collected and cataloged AI alignment research literature and analyzed the resulting dataset in an unbiased way to identify major research directions. We found that the field is growing quickly, with several subfields emerging in parallel. We looked at the subfields and identified the prominent researchers, recurring topics, and different modes of communication in each. Furthermore, we found that a classifier trained on AI alignment research articles can detect relevant articles that we did not originally include in the dataset.

(video presentation here)

# Dataset Announcement

In the context of the 6th AISC, we collected a dataset of alignment research articles from a variety of different sources. This dataset is now available for download here and the code for reproducing the scrape is on GitHub here[1]. When...

1Ben Smith12h
I would very much like to see your dataset, as a zotero database or some other format, in order to better orient myself to the space. Are you able to make this available somehow?
1Ben Smith12h
Very very helpful! The clustering is obviously a function of the corpus. From your narrative, it seems like you only added the missing arx.iv files after clustering. Is it possible the clusters would look different with those in?

Hey Ben! :) Thanks for the comment and the careful reading!

Yes, we only added the missing arx.iv papers after clustering, but then we repeat the dimensionality reduction and show that the original clustering still holds up even with the new papers (Figure 4 bottom right). I think that's pretty neat (especially since the dimensionality reduction doesn't "know" about the clustering) but of course the clusters might look slightly different if we also re-run k-means on the extended dataset.