This post is a not a so secret analogy for the AI Alignment problem. Via a fictional dialog, Eliezer explores and counters common questions to the Rocket Alignment Problem as approached by the Mathematics of Intentional Rocketry Institute. 

MIRI researchers will tell you they're worried that "right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination."

dirk8h105
2
Sometimes a vague phrasing is not an inaccurate demarkation of a more precise concept, but an accurate demarkation of an imprecise concept
Fabien Roger9hΩ4110
0
List sorting does not play well with few-shot mostly doesn't replicate with davinci-002. When using length-10 lists (it crushes length-5 no matter the prompt), I get: * 32-shot, no fancy prompt: ~25% * 0-shot, fancy python prompt: ~60%  * 0-shot, no fancy prompt: ~60% So few-shot hurts, but the fancy prompt does not seem to help. Code here. I'm interested if anyone knows another case where a fancy prompt increases performance more than few-shot prompting, where a fancy prompt is a prompt that does not contain information that a human would use to solve the task. This is because I'm looking for counterexamples to the following conjecture: "fine-tuning on k examples beats fancy prompting, even when fancy prompting beats k-shot prompting" (for a reasonable value of k, e.g. the number of examples it would take a human to understand what is going on).
The cost of goods has the same units as the cost of shipping: $/kg. Referencing between them lets you understand how the economy works, e.g. why construction material sourcing and drink bottling has to be local, but oil tankers exist. * An iPhone costs $4,600/kg, about the same as SpaceX charges to launch it to orbit. [1] * Beef, copper, and off-season strawberries are $11/kg, about the same as a 75kg person taking a three-hour, 250km Uber ride costing $3/km. * Oranges and aluminum are $2-4/kg, about the same as flying them to Antarctica. [2] * Rice and crude oil are ~$0.60/kg, about the same as $0.72 for shipping it 5000km across the US via truck. [3,4] Palm oil, soybean oil, and steel are around this price range, with wheat being cheaper. [3] * Coal and iron ore are $0.10/kg, significantly more than the cost of shipping it around the entire world via smallish (Handysize) bulk carriers. Large bulk carriers are another 4x more efficient [6]. * Water is very cheap, with tap water $0.002/kg in NYC. But shipping via tanker is also very cheap, so you can ship it maybe 1000 km before equaling its cost. It's really impressive that for the price of a winter strawberry, we can ship a strawberry-sized lump of coal around the world 100-400 times. [1] iPhone is $4600/kg, large launches sell for $3500/kg, and rideshares for small satellites $6000/kg. Geostationary orbit is more expensive, so it's okay for them to cost more than an iPhone per kg, but Starlink wants to be cheaper. [2] https://fred.stlouisfed.org/series/APU0000711415. Can't find numbers but Antarctica flights cost $1.05/kg in 1996. [3] https://www.bts.gov/content/average-freight-revenue-ton-mile [4] https://markets.businessinsider.com/commodities [5] https://www.statista.com/statistics/1232861/tap-water-prices-in-selected-us-cities/ [6] https://www.researchgate.net/figure/Total-unit-shipping-costs-for-dry-bulk-carrier-ships-per-tkm-EUR-tkm-in-2019_tbl3_351748799
Eric Neyman2d33-2
10
I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions: * By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value? * A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI, we can and will use it to figure out what we should use it for, and then we will use it in that way." To what extent is aligned AI going well contingent on something like this happening, and how likely do you think it is to happen? Why? * To what extent is your belief that aligned AI would go well contingent on some sort of assumption like: my idealized values are the same as the idealized values of the people or coalition who will control the aligned AI? * Do you care about AI welfare? Does your answer depend on whether the AI is aligned? If we built an aligned AI, how likely is it that we will create a world that treats AI welfare as important consideration? What if we build a misaligned AI? * Do you think that, to a first approximation, most of the possible value of the future happens in worlds that are optimized for something that resembles your current or idealized values? How bad is it to mostly sacrifice each of these? (What if the future world's values are similar to yours, but is only kinda effectual at pursuing them? What if the world is optimized for something that's only slightly correlated with your values?) How likely are these various options under an aligned AI future vs. an unaligned AI future?
Research Writing Workflow: First figure stuff out * Do research and first figure stuff out, until you feel like you are not confused anymore. * Explain it to a person, or a camera, or ideally to a person and a camera. * If there are any hiccups expand your understanding. * Ideally, as the last step, explain it to somebody whom you have not ever explained it to. * Only once you made a presentation without hiccups you are ready to write post. * If you have a recording this is useful as a starting point.

Popular Comments

Recent Discussion

This is a Concept Post. It may not be worth reading on its own, out of context. See the backlinks at the bottom to see which posts use this concept.

See the backlinks at the bottom of the post. Every post starting with [CP] is a concept post, that describes a concept this post is using.


Problem: Often when writing I come up with general concepts that make sense in isolation. Often I want to reuse these concepts without having to reexplain them.

A Concept Post is explaining a single concept, usually with no or minimal context. It is expected that the relevant context is provided by another post that links to the concept post.

Concept Posts can be very short. Much shorter than a regular post. They might not be worth reading on their own. Therefore the notice at the top...

1quila7m
i would use [Concept Dependency] or [Concept Reference] instead so the reader understands just from seeing the title on the front page. also avoids acronym collision
2Nathan Young7m
If that were true then there are many ways you could partially do that - eg give people a set of tokens to represent their mana at the time of the devluation and if at future point you raise. you could give them 10x those tokens back.
2James Grugett3h
We are trying our best to honor mana donations! If you are inactive you have until the rest of the year to donate at the old rate. If you want to donate all your investments without having to sell each individually, we are offering you a loan to do that. We removed the charity cap of $10k donations per month, which is going beyond what we previous communicated.

Nevertheless lots of people were hassled. That has real costs, both to them and to you. 

2Nathan Young1h
I’m discussing with Carson. I might change my mind but i don’t know that i’ll argue with both of you at once.

For the last month, @RobertM and I have been exploring the possible use of recommender systems on LessWrong. Today we launched our first site-wide experiment in that direction. 

Behold, a tab with recommendations!

(In the course of our efforts, we also hit upon a frontpage refactor that we reckon is pretty good: tabs instead of a clutter of different sections. For now, only for logged-in users. Logged-out users see the "Latest" tab, which is the same-as-usual list of posts.)

Why algorithmic recommendations?

A core value of LessWrong is to be timeless and not news-driven. However, the central algorithm by which attention allocation happens on the site is the Hacker News algorithm[1], which basically only shows you things that were posted recently, and creates a strong incentive for discussion to always be...

2habryka13m
GDPR is a giant mess, so it's pretty unclear what it requires us to implement. My current understanding is that it just requires us to tell you that we are collecting analytics data if you are from the EU.  And the kind of stuff we are sending over to Recombee would be covered by it being data necessary to provide site functionality, not just analytics, so wouldn't be covered by that (if you want to avoid data being sent to Google Analytics in-particular, you can do that by just blocking the GA script in uBlock origin or whatever other adblocker you use, which it should do by default).

drat, I was hoping that one would work. oh well. yes, I use ublock, as should everyone. Have you considered simply not having analytics at all :P I feel like it would be nice to do the thing that everyone ought to do anyway since you're in charge. If I was running a website I'd simply not use analytics.

back to the topic at hand, I think you should just make a vector embedding of all posts and show a HuMAP layout of it on the homepage. that would be fun and not require sending data anywhere. you could show the topic islands and stuff.

2kave27m
I am sad to see you getting so downvoted. I am glad you are bringing this perspective up in the comments.
2habryka29m
I am pretty excited about doing something more in-house, but it's much easier to get data about how promising this direction is by using some third-party services that already have all the infrastructure.  If it turns out to be a core part of LW, it makes more sense to in-house it. It's also really valuable to have an relatively validated baseline to compare things to.  There are a bunch of third-party services we couldn't really replace that we send user data to. Hex.tech as our analytics dashboard service. Google Analytics for basic user behavior and patterns. A bunch of AWS services. Implementing the functionality of all of that ourselves, or putting a bunch of effort into anonymizing the data is not impossible, but seems pretty hard, and Recombee seems about par for the degree to which I trust them to not do anything with that data themselves.

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

1Johannes C. Mayer38m
A few adjacent thoughts: * Why is a programming language like Haskell that is extremely powerful in the sense that if your program compiles, it is the program that you want with a very high probability because most stupid mistakes are now compile errors? * Why is there basically no widely used homoiconic language, i.e. a language in which you can use the language itself to <reason about the language/manipulate the language>. Here we have some technology that is basically ready to use (Haskell or Clojure), but people decide to mostly not use them. And with people, I mean professional programmers and companions who make software. * Why did nobody invent Rust earlier, by which I mean a system-level programming language that prevents you from making really dumb mistakes that can be machine-checked if you make them? * Why did it take like 40 years to get a latex replacement, even though latex is terrible in very obvious ways? These things have in common that there is a big engineering challenge. It feels like maybe this explains it, together with that people who would benefit from these technologies where in the position that the cost of creating them would have exceeded the benefit that they would expect from them. For Haskell and Clojure we can also consider this point. Certainly, these two technologies have their flaws and could be improved. But then again we would have a massive engineering challenge.
4Alexander Gietelink Oldenziel2h
I would not say that the central insight of SLT is about priors. Under weak conditions the prior is almost irrelevant. Indeed, the RLCT is independent of the prior under very weak nonvanishing conditions. The story that symmetries mean that the parameter-to-function map is not injective is true but already well-understood outside of SLT. It is a common misconception that this is what SLT amounts to. To be sure - generic symmetries are seen by the RLCT. But these are, in some sense, the uninteresting ones. The interesting thing is the local singular structure and its unfolding in phase transitions during training. The issue of the true distribution not being contained in the model is called 'unrealizability' in Bayesian statistics. It is dealt with in Watanabe's second 'green' book. Nonrealizability is key to the most important insight of SLT contained in the last sections of the second to last chapter of the green book: algorithmic development during training through phase transitions in the free energy. I don't have the time to recap this story here.

Lucius-Alexander SLT dialogue?

4Alexander Gietelink Oldenziel2h
All proofs are contained in the Watanabe's standard text, see here https://www.cambridge.org/core/books/algebraic-geometry-and-statistical-learning-theory/9C8FD1BDC817E2FC79117C7F41544A3A

Before we get started, this is your quarterly reminder that I have no medical credentials and my highest academic credential is a BA in a different part of biology (with a double major in computer science). In a world with a functional medical system no one would listen to me. 

Tl;dr povidone iodine probably reduces viral load when used in the mouth or nose, with corresponding decreases in symptoms and infectivity. The effect size could be as high as 90% for prophylactic use (and as low as 0% when used in late illness), but is probably much smaller. There is a long tail of side-effects. No study I read reported side effects at clinically significant levels, but I don’t think they looked hard enough. There are other gargle...

2Gordon Seidoh Worley4h
I'm somewhat confused. I may not be reading the charts you included right, but it sort of looks to me like just rinsing with saline is useful, and that seems like it should be extremely safe and low risk and just about as effective as anything else. Thoughts?

Yep that's my main contender for the better formulations referred to in the intro. . 

This is a Concept Post. It may not be worth reading on its own, out of context. See the backlinks at the bottom to see which posts use this concept.


Also known as periodically labeled lattice graphs in graph theory.

Here is a concrete Edge Regular Lattice Graph: Edge Regular Lattice Graphs 2024 04 26 21.19.48

In this graph, the following pattern is repeating locally: Edge Regular Lattice Graphs 2024 04 26 21.27.09

So a Edge Regular Lattice Graph is a Lattice graph , such that in the natural embedding of , each edge label points in the same direction from the perspective of every vertex. Also, the number of edge labels is twice the number of dimensions.

Above we have 4 edge labels in the 2D lattice graph. One for each direction. In a 3D lattice graph, we would have 6 edge labels.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

OC ACXLW Sat April 27 Argumentation and College Admissions

Hello Folks! We are excited to announce the 63rd Orange County ACX/LW meetup, happening this Saturday and most Saturdays after that.

Host: Michael Michalchik Email: michaelmichalchik@gmail.com (For questions or requests) Location: 1970 Port Laurent Place (949) 375-2045 Date: Saturday, April 27 2024 Time 2 pm

Conversation Starters:

  1. You Can Make an Argument for Anything by Nathan J. Robinson: This article argues that it is easy to create superficially convincing arguments for almost any position, no matter how heinous or false. The author suggests that the prevalence of these arguments can make it difficult for the truth to compete in the "marketplace of ideas."

Text link: https://www.currentaffairs.org/2018/11/you-can-make-an-argument-for-anything

Questions for discussion: a) The article suggests that people often do not investigate arguments very closely, and are...

TL;DR All GPT-3 models were decommissioned by OpenAI in early January. I present some examples of ongoing interpretability research which would benefit from the organisation rethinking this decision and providing some kind of ongoing research access. This also serves as a review of work I did in 2023 and how it progressed from the original ' SolidGoldMagikarp' discovery just over a year ago into much stranger territory.

Introduction

Some months ago, when OpenAI announced that the decommissioning of all GPT-3 models was to occur on 2024-01-04, I decided I would take some time in the days before that to revisit some of my "glitch token" work from earlier in 2023 and deal with any loose ends that would otherwise become impossible to tie up after that date.

This abrupt termination...

2eukaryote14h
Killer exploration into new avenues of digital mysticism. I have no idea how to assess it but I really enjoyed reading it.

Thanks!

I agree. This is unfortunately often done in various fields of research where familiar terms are reused as technical terms.

For example, in ordinary language "organic" means "of biological origin", while in chemistry "organic" describes a type of carbon compound. Those two definitions mostly coincide on Earth (most such compounds are of biological origin), but when astronomers announce they have found "organic" material on an asteroid this leads to confusion.

2Viliam3h
Specific examples would be nice. Not sure if I understand correctly, but I imagine something like this: You always choose A over B. You have been doing it for such long time that you forgot why. Without reflecting about this directly, it just seems like there probably is a rational reason or something. But recently, either accidentally or by experiment, you chose B... and realized that experiencing B (or expecting to experience B) creates unpleasant emotions. So now you know that the emotions were the real cause of choosing A over B all that time. (This is probably wrong, but hey, people say that the best way to elicit answer is to provide a wrong one.)
1cubefox4h
Yeah. It's possible to give quite accurate definitions of some vague concepts, because the words used in such definitions also express vague concepts. E.g. "cygnet" - "a young swan".
1dkornai6h
I would say that if a concept is imprecise, more words [but good and precise words] have to be dedicated to faithfully representing the diffuse nature of the topic. If this larger faithful representation is compressed down to fewer words, that can lead to vague phrasing. I would therefore often view vauge phrasing as a compression artefact, rather than a necessary outcome of translating certain types of concepts to words. 

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA