This post is a not a so secret analogy for the AI Alignment problem. Via a fictional dialog, Eliezer explores and counters common questions to the Rocket Alignment Problem as approached by the Mathematics of Intentional Rocketry Institute. 

MIRI researchers will tell you they're worried that "right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination."

dirk3h95
1
Sometimes a vague phrasing is not an inaccurate demarkation of a more precise concept, but an accurate demarkation of an imprecise concept
My current main cruxes: 1. Will AI get takeover capability? When? 2. Single ASI or many AGIs? 3. Will we solve technical alignment? 4. Value alignment, intent alignment, or CEV? 5. Defense>offense or offense>defense? 6. Is a long-term pause achievable? If there is reasonable consensus on any one of those, I'd much appreciate to know about it. Else, I think these should be research priorities.
Thomas Kwa19h213
0
The cost of goods has the same units as the cost of shipping: $/kg. Referencing between them lets you understand how the economy works, e.g. why construction material sourcing and drink bottling has to be local, but oil tankers exist. * An iPhone costs $4,600/kg, about the same as SpaceX charges to launch it to orbit. [1] * Beef, copper, and off-season strawberries are $11/kg, about the same as a 75kg person taking a three-hour, 250km Uber ride costing $3/km. * Oranges and aluminum are $2-4/kg, about the same as flying them to Antarctica. [2] * Rice and crude oil are ~$0.60/kg, about the same as $0.72 for shipping it 5000km across the US via truck. [3,4] Palm oil, soybean oil, and steel are around this price range, with wheat being cheaper. [3] * Coal and iron ore are $0.10/kg, significantly more than the cost of shipping it around the entire world via smallish (Handysize) bulk carriers. Large bulk carriers are another 4x more efficient [6]. * Water is very cheap, with tap water $0.002/kg in NYC. But shipping via tanker is also very cheap, so you can ship it maybe 1000 km before equaling its cost. It's really impressive that for the price of a winter strawberry, we can ship a strawberry-sized lump of coal around the world 100-400 times. [1] iPhone is $4600/kg, large launches sell for $3500/kg, and rideshares for small satellites $6000/kg. Geostationary orbit is more expensive, so it's okay for them to cost more than an iPhone per kg, but Starlink wants to be cheaper. [2] https://fred.stlouisfed.org/series/APU0000711415. Can't find numbers but Antarctica flights cost $1.05/kg in 1996. [3] https://www.bts.gov/content/average-freight-revenue-ton-mile [4] https://markets.businessinsider.com/commodities [5] https://www.statista.com/statistics/1232861/tap-water-prices-in-selected-us-cities/ [6] https://www.researchgate.net/figure/Total-unit-shipping-costs-for-dry-bulk-carrier-ships-per-tkm-EUR-tkm-in-2019_tbl3_351748799
Fabien Roger4hΩ240
0
List sorting does not play well with few-shot mostly doesn't replicate with davinci-002. When using length-10 lists (it crushes length-5 no matter the prompt), I get: * 32-shot, no fancy prompt: ~25% * 0-shot, fancy python prompt: ~60%  * 0-shot, no fancy prompt: ~60% So few-shot hurts, but the fancy prompt does not seem to help. Code here. I'm interested if anyone knows another case where a fancy prompt increases performance more than few-shot prompting, where a fancy prompt is a prompt that does not contain information that a human would use to solve the task. This is because I'm looking for counterexamples to the following conjecture: "fine-tuning on k examples beats fancy prompting, even when fancy prompting beats k-shot prompting" (for a reasonable value of k, e.g. the number of examples it would take a human to understand what is going on).
dirk4h30
0
I'm against intuitive terminology [epistemic status: 60%] because it creates the illusion of transparency; opaque terms make it clear you're missing something, but if you already have an intuitive definition that differs from the author's it's easy to substitute yours in without realizing you've misunderstood.

Popular Comments

Recent Discussion

Austin said they have $1.5 million in the bank, vs $1.2 million mana issued. The only outflows right now are to the charity programme which even with a lot of outflows is only at $200k. they also recently raised at a $40 million valuation. I am confused by running out of money. They have a large user base that wants to bet and will do so at larger amounts if given the opportunity. I'm not so convinced that there is some tiny timeline here.

But if there is, then say so "we know that we often talked about mana being eventually worth $100 mana per dollar, but ... (read more)

2Nathan Young3m
Austin took his salary in mana as an often referred to incentive for him to want mana to become valuable, presumably at that rate. I recall comments like 'we pay 250 in referrals mana per user because we reckon we'd pay about $2.50' likewise in the in person mana auction. I'm not saying it was an explicit contract, but there were norms.
2Nathan Young4m
That being said, we will do everything we can to communicate to our users what our plans are for the future and work with anyone who has participated in our platform with the expectation of being able to donate mana earnings." "everything we can" is not a couple of weeks notice and lot of hassle.  Am I supposed to trust this organisation in future with my real money? (via https://manifoldmarkets.notion.site/Charitable-donation-program-668d55f4ded147cf8cf1282a007fb005)  
2Nathan Young5m
Well they have a much larger donation than has been spent so there were ways to avoid this abrupt change: "Manifold for Good has received grants totaling $500k from the Center for Effective Altruism (via the FTX Future Fund) to support our charitable endeavors." Manifold has donated $200k so far. So there is $300k left. Why not at least, say "we will change the rate at which mana can be donated when we burn through this money"  (via https://manifoldmarkets.notion.site/Charitable-donation-program-668d55f4ded147cf8cf1282a007fb005 )
1Bogdan Ionut Cirstea6h
Hey Jacques, sure, I'd be happy to chat!  
1Bogdan Ionut Cirstea6h
Yeah, I'm unsure if I can tell any 'pivotal story' very easily (e.g. I'd still be pretty skeptical of enumerative interp even with GPT-5-MAIA). But I do think, intuitively, GPT-5-MAIA might e.g. make 'catching AIs red-handed' using methods like in this comment significantly easier/cheaper/more scalable. 

But I do think, intuitively, GPT-5-MAIA might e.g. make 'catching AIs red-handed' using methods like in this comment significantly easier/cheaper/more scalable.

Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of things.

I agree that this model might help in performing various input/output experiments to determine what made a model do a given suspicious action.

TL;DR

Tacit knowledge is extremely valuable. Unfortunately, developing tacit knowledge is usually bottlenecked by apprentice-master relationships. Tacit Knowledge Videos could widen this bottleneck. This post is a Schelling point for aggregating these videos—aiming to be The Best Textbooks on Every Subject for Tacit Knowledge Videos. Scroll down to the list if that's what you're here for. Post videos that highlight tacit knowledge in the comments and I’ll add them to the post. Experts in the videos include Stephen Wolfram, Holden Karnofsky, Andy Matuschak, Jonathan Blow, Tyler Cowen, George Hotz, and others. 

What are Tacit Knowledge Videos?

Samo Burja claims YouTube has opened the gates for a revolution in tacit knowledge transfer. Burja defines tacit knowledge as follows:

Tacit knowledge is knowledge that can’t properly be transmitted via verbal or written instruction, like the ability to create

...

"Mise En Place", "[i]nterviews and kitchen walkthroughs:

Qualifies as tacit knowledge, in that people are showing what they're doing that you seldom have a chance to watch first-hand. Reasonably entertaining, seems like you could learn a bit here.

Caveat: most of the dishes are really high-class/meat/fish etc. that you aren't very likely to ever cook yourself, and knowledge seems difficult to transfer.

2Wei Dai6h
Why do you think these values are positive? I've been pointing out, and I see that Daniel Kokotajlo also pointed out in 2018 that these values could well be negative. I'm very uncertain but my own best guess is that the expected value of misaligned AI controlling the universe is negative, in part because I put some weight on suffering-focused ethics.
  • My current guess is that max good and max bad seem relatively balanced. (Perhaps max bad is 5x more bad/flop than max good in expectation.)
  • There are two different (substantial) sources of value/disvalue: interactions with other civilizations (mostly acausal, maybe also aliens) and what the AI itself terminally values
  • On interactions with other civilizations, I'm relatively optimistic that commitment races and threats don't destroy as much value as acausal trade generates on some general view like "actually going through with threats is a waste of resourc
... (read more)
1mesaoptimizer7h
e/acc is not a coherent philosophy and treating it as one means you are fighting shadows. Landian accelerationism at least is somewhat coherent. "e/acc" is a bundle of memes that support the self-interest of the people supporting and propagating it, both financially (VC money, dreams of making it big) and socially (the non-Beff e/acc vibe is one of optimism and hope and to do things -- to engage with the object level -- instead of just trying to steer social reality). A more charitable interpretation is that the philosophical roots of "e/acc" are founded upon a frustration with how bad things are, and a desire to improve things by yourself. This is a sentiment I share and empathize with. I find the term "techno-optimism" to be a more accurate description of the latter, and perhaps "Beff Jezos philosophy" a more accurate description of what you have in your mind. And "e/acc" to mainly describe the community and its coordinated movements at steering the world towards outcomes that the people within the community perceive as benefiting them.
1Quinn2h
sure -- i agree that's why i said "something adjacent to" because it had enough overlap in properties. I think my comment completely stands with a different word choice, I'm just not sure what word choice would do a better job.

Warning: This post might be depressing to read for everyone except trans women. Gender identity and suicide is discussed. This is all highly speculative. I know near-zero about biology, chemistry, or physiology. I do not recommend anyone take hormones to try to increase their intelligence; mood & identity are more important.

Why are trans women so intellectually successful? They seem to be overrepresented 5-100x in eg cybersecurity twitter, mathy AI alignment, non-scam crypto twitter, math PhD programs, etc.

To explain this, let's first ask: Why aren't males way smarter than females on average? Males have ~13% higher cortical neuron density and 11% heavier brains (implying   more area?). One might expect males to have mean IQ far above females then, but instead the means and medians are similar:

Left. Right.

My theory...

Yes my point is the low T did it before the transition

1kromem16h
It implicitly does compare trans women to other women in talking about the performance similarity between men and women: "Why aren't males way smarter than females on average? Males have ~13% higher cortical neuron density and 11% heavier brains (implying 1.112/3−1=7% more area?). One might expect males to have mean IQ far above females then, but instead the means and medians are similar" So OP is saying "look, women and men are the same, but trans women are exceptional." I'm saying that identifying the exceptionality of trans women ignores the environmental disadvantage other women experience, such that the earlier claims of unexceptionable performance of women (which as I quoted gets an explicit mention from a presumption of assumed likelihood of male competency based on what's effectively phrenology) are reflecting a disadvantaged sample vs trans women. My point is that if you accounted for environmental factors the data would potentially show female exceptionality across the board and the key reason trans women end up being an outlier against both men and other women is because they are avoiding the early educational disadvantage other women experience.
This is a linkpost for https://medium.com/p/aeb68729829c

It's a ‘superrational’ extension of the proven optimality of cooperation in game theory 
+ Taking into account asymmetries of power
// Still AI risk is very real

Short version of an already skimmed 12min post
29min version here


For rational agents (long-term) at all scale (human, AGI, ASI…)


In real contexts, with open environments (world, universe), there is always a risk to meet someone/something stronger than you, and overall weaker agents may be specialized in your flaws/blind spots. 


To protect yourself, you can choose the maximally rational and cooperative alliance:


Because any agent is subjected to the same pressure/threat of (actual or potential) stronger agents/alliances/systems, one can take an insurance that more powerful superrational agents will behave well by behaving well with weaker agents. This is the basic rule allowing scale-free cooperation.


If you integrated this super-cooperative...

1Ryo 28m
Indeed, I am insisting in the three posts that from our perspective, this is the crucial point:  Fermi's paradox. Now there is a whole ecosystem of concepts surrounding it, and although I have certain preferred models, the point is that uncertainty is really heavy. Those AI-lions are cosmical lions thinking on cosmical scales. Is it easy to detect an AI-Dragon you may meet in millions/billions of years? Is it undecidable? Probably. For many reasons* Is this [astronomical level of uncertainty/undecidability + the maximal threat of a death sentence] worth the gamble? -> "Meeting a stronger AI" = "death" -> Maximization = 0 -> AI only needs 1 stronger AI to be dead.   What is the likelihood for a human-made AI to not encounter [a stronger alien AI], during the whole length of their lifetime? *(reachable but rare and far in space-time Dragons, but also cases where Dragons are everywhere and so advanced that lower technological proficiency isn't enough etc.).
Ryo 18m10

I can't be certain of the solidity of this uncertainty, and think we still have to be careful, but overall, the most parsimonious prediction to me seems to be super-coordination.
 

Compared to the risk of facing a revengeful super-cooperative alliance, is the price of maintaining humans in a small blooming "island", really that high?

Many other-than-human atoms are lions' prey.

And a doubtful AI may not optimize fully for super-cooperation, simply alleviating the price to pay in the counterfactuals where they encounter a super-cooperative cluster (resulti... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
This is a linkpost for https://arxiv.org/abs/2309.02390

This is a linkpost for our paper Explaining grokking through circuit efficiency, which provides a general theory explaining when and why grokking (aka delayed generalisation) occurs, and makes several interesting and novel predictions which we experimentally confirm (introduction copied below). You might also enjoy our explainer on X/Twitter.

Abstract

One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation. We propose that grokking occurs when the task admits a generalising solution and a memorising solution, where the generalising solution is slower to learn but more efficient, producing larger logits with the same parameter norm. We hypothesise that memorising circuits become more inefficient with larger training datasets while generalising circuits do...

2LawrenceC6h
Higher weight norm means lower effective learning rate with Adam, no? In that paper they used a constant learning rate across weight norms, but Adam tries to normalize the gradients to be of size 1 per paramter, regardless of the size of the weights. So the weights change more slowly with larger initializations (especially since they constrain the weights to be of fixed norm by projecting after the Adam step). 
Rohin Shah36mΩ220

Sounds plausible, but why does this differentially impact the generalizing algorithm over the memorizing algorithm?

Perhaps under normal circumstances both are learned so fast that you just don't notice that one is slower than the other, and this slows both of them down enough that you can see the difference?

I refuse to join any club that would have me as a member.

— Groucho Marx

Alice and Carol are walking on the sidewalk in a large city, and end up together for a while.

"Hi, I'm Alice! What's your name?"

Carol thinks:

If Alice is trying to meet people this way, that means she doesn't have a much better option for meeting people, which reduces my estimate of the value of knowing Alice. That makes me skeptical of this whole interaction, which reduces the value of approaching me like this, and Alice should know this, which further reduces my estimate of Alice's other social options, which makes me even less interested in meeting Alice like this.

Carol might not think all of that consciously, but that's how human social reasoning tends to...

gwern38m30

Hence the advice to lost children to not accept random strangers soliciting them spontaneously, but if no authority figure is available, to pick a random adult and ask them for help.

10jchan1h
A corollary of this is that if anyone at an [X] gathering is asked “So, what got you into [X]?” and answers “I heard there’s a great community around [X]”, then that person needs to be given the cold shoulder and made to feel unwelcome, because otherwise the bubble of deniability is pierced and the lemon spiral will set in, ruining it for everyone else. However, this is pretty harsh, and I’m not confident enough in this chain of reasoning to actually “gatekeep” people like this in practice. Does this ring true to you?

My credence: 33% confidence in the claim that the growth in the number of GPUs used for training SOTA AI will slow down significantly directly after GPT-5. It is not higher because of (1) decentralized training is possible, and (2) GPT-5 may be able to increase hardware efficiency significantly, (3) GPT-5 may be smaller than assumed in this post, (4) race dynamics.

TLDR: Because of a bottleneck in energy access to data centers and the need to build OOM larger data centers.

The reasoning behind the claim:

...

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA