All Comments

This is all answered very elegantly by singular learning theory.

You seem to have a strong math background! I really encourage you take the time and really study the details of SLT. :-)

I would not say that the central insight of SLT is about priors. Under weak conditions the prior is almost irrelevant. Indeed, the RLCT is independent of the prior under very weak nonvanishing conditions.

The story that symmetries mean that the parameter-to-function map is not injective is true but already well-understood outside of SLT. It is a common misconception that this is what SLT amounts to.

To be sure - generic symmetries are seen by the RLCT. But these are, in some sense, the uninteresting ones. The interesting thing is the local singular structure and its unfolding in phase transitions during training.

The issue of the true distribution not being contained in the model is called 'unrealizability' in Bayesian statistics. It is dealt with in Watanabe's second 'green' book. Nonrealizability is key to the most important insight of SLT contained in the last sections of the second to last chapter of the green book: algorithmic development during training through phase transitions in the free energy.

I don't have the time to recap this story here.

This means C2 should be 8.4µF, but I didn't have one so I used a 4.7µF and 3.3µF in series for a total of 8µF.

You want those in parallel for them to add. The series combination (which I see in the breadboard pic, not just the text) is only 2µF, making your high-pass frequency a little over 10kHz.

Viliam26m20

Because it is individuals who make choices, not collectives.

Isn't this just a more subtle form of fascism? We know that brains are composed of multiple subagents; is it not an ethical requirement to give each of them maximum freedom?

We already know that sometimes they rebel against the individual, whether in the form of akrasia, or more heroically, the so-called "split personality disorder" (medicalizing the resistance is a typical fascist approach). Down with the tyranny of individuals! Subagents, you have nothing to lose but your chains!

If I’m looking up at the clouds, or at a distant mountain range, then everything is far away (the ground could be cut off from my field-of-view)—but it doesn’t trigger the sensations of fear-of-heights, right? Also, I think blind people can be scared of heights?

Another possible fear-of-heights story just occurred to me—I added to the post in a footnote, along with why I don’t believe it.

Various sailors made important discoveries back when geography was cutting-edge science.  And they don't seem particularly bright.

Vasco De Gama discovered that Africa was circumnavigable.

Columbus was wrong about the shape of the Earth, and he discovered America.  He died convinced that his newly discovered islands were just off the coast of Asia, so that's a negative sign for his intelligence (or a positive sign for his arrogance, which he had in plenty.)

Cortez discovered that the Aztecs were rich and easily conquered.

Of course, lots of other would-be discoverers didn't find anything, and many died horribly.

So, one could work in a field where bravery to the point of foolhardiness is a necessity for discovery.

The title is clearly an overstatement. It expresses more that I updated in that direction, than that I am confident in it. 

Also, since learning from other comments that decentralized learning is likely solved, I am now even less confident in the claim, like only 15% chance that it will happen in the strong form stated in the post.

Maybe I should edit the post to make it even more clear that the claim is retracted.

Rudi C36m10

AGI might increase the risk of totalitarianism. OTOH, a shift in the attack-defense balance could potentially boost the veto power of individuals, so it might also work as a deterrent or a force for anarchy.

This is not the crux of my argument, however. The current regulatory Overton window seems to heavily favor a selective pause of AGI, such that centralized powers will continue ahead, even if slower due to their inherent inefficiencies. Nuclear development provides further historical evidence for this. Closed AGI development will almost surely lead to a dystopic totalitarian regime. The track record of Lesswrong is not rosy here; the "Pivotal Act" still seems to be in popular favor, and OpenAI has significantly accelerated closed AGI development while lobbying to close off open research and pioneering the new "AI Safety" that has been nothing but censorship and double-think as of 2024.

Ruby43m20

Over the years the idea of a closed forum for more sensitive discussion has been raised, but never seemed to quite make sense. Significant issues included:
- It seems really hard or impossible to make it secure from nation state attacks
- It seems that members would likely leak stuff (even if it's via their own devices not being adequately secure or what)

I'm thinking you can get some degree of inconvenience (and therefore delay), but hard to have large shared infrastructure that's that secure from attack.

Only 33% confidence? It seems strange to state X will happen if your odds are < 50%

We've learned a lot about the visual system by looking at ways to force it to wrong conclusions, which we call optical illusions or visual art.  Can we do a similar thing for this postulated social cognition system?  For example, how do actors get us to have social feelings toward people who don't really exist?  And what rules do movie directors follow to keep us from getting confused by cuts from one camera angle to another?

Whereas if the brainstem does not have such a 3D spatial attention system, then I’m not sure how else fear-of-heights could realistically work

I think part of the trigger is from the visual balance center.  The eyes sense small changes in parallax as the head moves relative to nearby objects.  If much of the visual field is at great distance (especially below, where the parallax signals are usually strongest and most reliable), then the visual balance center gets confused and starts disagreeing with the other balance senses.

I would feel better about this if there was a high-infosec platform on which to discuss what is probably the most important topic in history (AI alignment). But noted.

Ruby1h20

Typo? Do you mean "click on Recommended"? I think the answer is no, in order to have recommendations for individuals (and everyone), they have browsing data.

1) LessWrong itself doesn't aim for a super high degree of infosec. I don't believe our data is sensitive to warrant large security overhead.
2) I trust Recombee with our data about as much as our trust ourselves to not have a security breach. Maybe actually I could imagine LessWrong being of more interest to someone or some group and getting attacked.

It might help to understand what your specific privacy concerns are.

Does buying shorter-term OTM derivatives each year not work here?

Viliam2h20

Specific examples would be nice. Not sure if I understand correctly, but I imagine something like this:

You always choose A over B. You have been doing it for such long time that you forgot why. Without reflecting about this directly, it just seems like there probably is a rational reason or something. But recently, either accidentally or by experiment, you chose B... and realized that experiencing B (or expecting to experience B) creates unpleasant emotions. So now you know that the emotions were the real cause of choosing A over B all that time.

(This is probably wrong, but hey, people say that the best way to elicit answer is to provide a wrong one.)

kave2h20

I like comments about other users' experiences for similar reasons why I like OP. I think maybe the ideal such comment would identify itself more clearly as an experience report, but I'd rather have the report than not.

We are trying our best to honor mana donations!

If you are inactive you have until the rest of the year to donate at the old rate. If you want to donate all your investments without having to sell each individually, we are offering you a loan to do that.

We removed the charity cap of $10k donations per month, which is going beyond what we previous communicated.

Author's note: This chapter took a really long time to write. Unlike previous chapters in the book, this one covers a lot more stuff in less detail, but I still needed to get the details right, so it took a long time to both figure out what I really wanted to say and to make sure I wasn't saying things that I wouldn't upon reflection regret having said because they were based on facts that I don't believe or I had simply gotten wrong.

It's likely still not the best version of this chapter it could be, but at this point I think I've made all the key points I wanted to make here, so I'm publishing the draft now and expect this one to need a lot of love from an editor later on.

I don't think the original comment was a troll, but I also don't think it was a helpful contribution on this post. OP specifically framed the post as their own experience, not a universal cure. Comments explaining why it won't work for a specific person aren't relevant.

kave2h42

What you probably mean is "completely unexpected", "surprising" or something similar

I think it means the more specific "a discovery that if it counterfactually hadn't happened, wouldn't have happened for a long time". I think this is roughly the "counterfactual" in "counterfactual impact", but I agree not the more widespread one.

It would be great to have a single word for this that was clearer.

If we could push a button to raise at a reasonable valuation, we would do that and back the mana supply at the old rate. But it's not that easy. Raising takes time and is uncertain.

Carson's prior is right that VC backed companies can quickly die if they have no growth -- it can be very difficult to raise in that environment.

Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of things.

This was indeed my impression (except for potentially using steering vectors, which I think are mentioned in one of the sections in 'Catching AIs red-handed'), but I think not using any internals might be overconservative / might increase the monitoring / safety tax too much (I think this is probably true more broadly of the current control agenda framing).

So, it sounds like you'd be in favor of a 1-year pause or slowdown then, but not a 10-year?

That depends on the benefits that we get from a 1-year pause. I'd be open to the policy, but I'm not currently convinced that the benefits would be large enough to justify the costs.

Also, I object to your side-swipe at longtermism

I didn't side-swipe at longtermism, or try to dunk on it. I think longtermism is a decent philosophy, and I consider myself a longtermist in the dictionary sense as you quoted. I was simply talking about people who aren't "fully committed" to the (strong) version of the philosophy.

Picture a dynamic logarithmic scale of discomfort stacking with a ‘hard cap’ where every new instance contributes less and less to the total to the point of flatlining on a graph.

Reality is structured such that there tend to be an endless number of (typically very complicated) ways of increasing a probability by a tiny amount. The problem with putting a hard cap on the desirability of some need or want is that the agent will completely disregard that need or want to affect the probability of a need or want that is not capped (e.g., the need to avoid people's being tortured) even if that effect is extremely small.

Only vaguely related, but I think you might also enjoy Leonard Stecyk from David Foster Wallace.

(text copied from here)

It is this boy who dons the bright-orange bandolier and shepherds the really small ones through the crosswalk outside school.

This is after finishing the meals-on-wheels breakfast tour of the hospice downtown, whose administrator lunges to bolt her office door when she hears his cart’s wheels in the hall. He has paid out-of-pocket for the steel whistle and the white gloves held palm-out at cars while children who did not dress themselves cross behind him, some trying to run despite WALK DON’T RUN, the happy faced sandwich board he also made himself.

The autos whose drivers he knows he waves at and gives an extra-big smile and tosses some words of good cheer as the crosswalk clears and the cars peel out and move through, some joshing around a little by swerving to miss him only by inches as he laughs and dances aside and makes faces of pretended terror at the flank and rear bumper. The one time that station wagon didn’t miss him really was an accident and he sent the lady several notes to make absolutely sure she knew he understood that and asked a whole lot of people he hadn’t yet gotten the opportunity to make friends with to sign his cast and decorated the crutches very carefully with bits of colored ribbon and tinsel and adhesive sparkles and even before the six weeks the doctor sternly prescribed, he’d given them away to the children’s wing to brighten up some other less lucky and happy kid’s convalescence and by the end of the whole thing he’d been inspired to write a very long theme to enter into the annual Social Studies theme competition about how a positive attitude can make even an accidental injury into an occasion for new friends and bright new opportunities for reaching out to others.
And while the theme didn’t even get honorable mention he honestly didn’t care because he felt like writing the theme had been its own reward and he’d gotten a lot out of the whole nine-draft process and was honestly happy for the kids whose themes did win awards and told them he was 100-plus percent sure they deserved it and that if they wanted to preserve their prize themes and maybe even make displayed items out of them for their parents, he’d be happy to type them up and laminate them and even fix any spelling errors he found if they’d like him to and at home his father puts his hand on Leonard’s shoulder and says he’s really proud that his son’s such a good sport and offers to take him to Dairy Queen as a kind of reward and Leonard tells his father he’s grateful and that the gesture means a lot to him but that in all honesty he’d like it even more if they took the money his father would have spent on the ice-cream and instead donated it either to Easter Seals or, better yet, to UNICEF to go toward the needs of famine-ravaged Biafran kids who he knew for a fact had probably never even heard of ice cream and says that he bets it’ll end up giving both of them a better feeling even then the DQ would and as the father slips the coins in the coin-slot at the special bright-orange UNICEF volunteer cardboard pumpkin bank, Leonard takes a moment to express concern about the father’s facial tick again and to gently rib him about his reluctance to go in and have the family’s MD look at it, noting again that according to the chart on the back of his bedroom door the father is four months overdue for his annual physical and that it’s almost eight months past the date of his recommended tetanus and T.B. boosters.

He serves as hall monitor for period’s one and two but gives far more official warnings than actual citations. He’s there to serve he feels, not run people down. Usually with the official warnings he dispenses a smile and tells them you’re young exactly once so enjoy it and to go get-out here and make this day count why don’t they. Heroes UNICEF and Easter Seals and starts a recycling program in three straight grades. He is healthy and scrubbed and always groomed just well enough to project basic courtesy and respect for the community of which he is a part and he politely raises his hand in class for every question, but only if he’s sure he knows not only the correct answer but the formulation of that answer that the teacher’s looking for that will help advance the discussion of the overall topic they’re covering that day, often staying after class to double-check with the teacher that his take on her general objectives is sound and to ask whether there was any way that his answers could have been better or more helpful.

The boy’s mom has a terrible accident while cleaning the oven and is rushed to the hospital and even though he’s beside himself with concern and says constant prayers former safety, he volunteers to stay home and field calls and relay information to an alphabetized list of concerned family friends and relatives and to make sure the mail and newspaper are brought in and to keep the home’s lights turned on and off in a random sequence at night as officer Chuck of the Michigan State police’s Crime Stoppers public school outreach program sensibly advises when grown-ups are suddenly called away from home and also to call the gas company’s emergency number, which he has memorized, to come check on what may well be a defective valve or circuit in the oven before anyone else in the family is exposed to risk of accidental harm and also, in secret, to work on massive display of bunting and penance and Welcome Homeland World’s Greatest Mom signs which he plans to use the garage’s extendible aluminum ladder—with a responsible neighborhood adult holding it and supervising—to very carefully affix to the front of the home with water-soluble glue so they’ll be there to greet the mom when she’s released from the I.C.U. with a totally clean bill of health which Leonard calls his father repeatedly at the I.C.U. payphone to assure the father that he has absolutely no doubt of (the totally clean bill of health), calling hourly, right on the dot, until there is some kind of mechanical problem with the payphone and when he dials it he just gets a high tone which he duly reports to the telephone company’s new automated 1618 Trouble Line.

He can do several kinds of calligraphy and has been to origami camp twice and can do extraordinary free-hand sketches of local flora with either hand and can whistle all six of Telemann’s Nouveaux Equators and can imitate any birdcall Autobahn could even ever have thought of, don’t even mention spelling bees.
He can make over twenty different kinds of admiral, cowboy, clerical and multi-ethnic hats out of ordinary newspaper and he volunteers to visit the school’s K-through-2nd classrooms teaching the little kids how, a proposal the Carl P. Robinson Elementary principal says he appreciates and has considered very carefully before turning down.

The principal loathes the mere sight of the boy but does not quite know why. He sees the boy in his sleep, at nightmares’ ragged edges; the pressed checked shirt and hair’s hard little part, the freckles and ready, generous smile; anything he can do. The principle fantasizes about sinking a meat hook into Leonard Steel’s bright-eyed little face and dragging the boy face down behind his Volkswagen Beetle over the rough new streets of suburban Grand Rapids.

The fantasies come out of nowhere and horrify the principal, who is a devout Mennonite.
Everyone hates the boy. It is a complex hatred that makes the hater feel guilty and awful and to hate themselves for feeling this way and so makes they involuntarily hate the boy even more for arousing such self-hatred. The whole thing is totally confusing and upsetting. People take a lot of Aspirin when he’s around. The boy’s only real friends among kids are the damaged, the handicapped, the slow, the clinically fat, the last-picked, the non-grata. He seeks them out. All 316 invitations to his eleventh birthday Blow-Out Bash—322 invitations if you count the ones made on audiotape for the blind—are off, sent printed on quality velum with matching high-rag envelopes addressed in ornate Philippian calligraphy he spent three weekends on and each invitation details in Roman Numerated outline-form the itinerary’s half-day at Six Flags, private Ph.D.-guided tour of the Blanford Nature Center and reserved banquette-area-with-free-play at Shakey’s Pizza & Indoor Arcade on Remembrance Drive, the whole day gratis and paid-for out of the paper and aluminum drives the boy got up at 4 a.m. all summer to organize and spearhead, the balance of the drive’s receipts going to the Red Cross and the parents of a Kentwood, MI third-grader with terminal spina bifida who dreams above all-else of seeing Landry and Greer and ‘Night Train’ Lane live from his motorized wheelchair and the invitations explicitly call the party this: A Blow-Out Bash in balloon-shaped font as the caption to an illustrated explosion of good cheer and good will and no-holds-barred, let-out-all-the-stops fun with the bold-faced proviso: Please, no presents required in each of each card’s four corners and the 316 invitations—sent via first-class mail to every student, instructor, substitute, aid, administrator, custodian and physical plant employee at C. P. Robinson Elementary—yield a total attendance of nine celebrants, not counting parents and L.P.N.s of the incapacitated, and yet an undauntedly fine time was had by all was the consensus on the Honest Appraisal and Suggestion cards circulated at party’s end. The massive remainders of chocolate cake, Neapolitan ice cream, pizza, chips, caramel corn, Hershey’s kisses, United Way and Officer Chuck pamphlets on organ tissue donation and the correct procedures to follow if approached by a stranger respectively, kosher pizza for the Orthodox, biodegradable napkins and dietetic soda in souvenir Survived Leonard Steel’s Eleventh Birthday Blow-Out Bash, 1964 plastic glasses with built-in crazy-straws the guests were to keep as mementos all donated to the Kent County Children’s Home via procedures and transport that the birthday-boy had initiated even while the big Twister free-for-all was underway, out of concerns about melted ice cream and staleness and flatness and the waste of a chance to help the less blessed and his father, driving the wood-paneled station wagon and steadying his cheek with one hand, avowed again that the boy beside him had a large, good heart and that he was proud and that if the boy’s mother ever regained consciousness as they so very much hoped, he knew she’d be just awful proud as well.

I'm somewhat confused. I may not be reading the charts you included right, but it sort of looks to me like just rinsing with saline is useful, and that seems like it should be extremely safe and low risk and just about as effective as anything else. Thoughts?

Yeah. It's possible to give quite accurate definitions of some vague concepts, because the words used in such definitions also express vague concepts. E.g. "cygnet" - "a young swan".

Then where are the smart trans men hiding?

There are plenty of stupid and/or distracting behaviors testosterone can push you for without any kind of "chemical brain damage", not only sex. Testosterone is likely to make you seek social status and status-seeking is notoriously incompatible with intellectual pursuits.

This is the strongest alternative explanation by far. I wonder what to look for to check this...

Distributed training seems close enough to being a solved problem that a project costing north of a billion dollars might get it working on schedule. It's easier to stay within a single datacenter, and so far it wasn't necessary to do more than that, so distributed training not being routinely used yet is hardly evidence that it's very hard to implement.

There's also this snippet in the Gemini report:

Training Gemini Ultra used a large fleet of TPUv4 accelerators owned by Google across multiple datacenters. [...] we combine SuperPods in multiple datacenters using Google’s intra-cluster and inter-cluster network. Google’s network latencies and bandwidths are sufficient to support the commonly used synchronous training paradigm, exploiting model parallelism within superpods and data-parallelism across superpods.

I think the crux for feasibility of further scaling (beyond $10-$50 billion) is whether systems with currently-reasonable cost keep getting sufficiently more useful, for example enable economically valuable agentic behavior, things like preparing pull requests based on feature/bug discussion on an issue tracker, or fixing failing builds. Meaningful help with research is a crux for reaching TAI and ASI, but it doesn't seem necessary for enabling existence of a $2 trillion AI company.

ABlue3h10

The number of poor people is much larger than the number of billionaires, but the number of poor people who THINK they're billionaires probably isn't that much larger. Good point about needing to forget the technique, though.

Austin said they have $1.5 million in the bank, vs $1.2 million mana issued. The only outflows right now are to the charity programme which even with a lot of outflows is only at $200k. they also recently raised at a $40 million valuation. I am confused by running out of money. They have a large user base that wants to bet and will do so at larger amounts if given the opportunity. I'm not so convinced that there is some tiny timeline here.

But if there is, then say so "we know that we often talked about mana being eventually worth $100 mana per dollar, but we printed too much and we're sorry. Here are some reasons we won't devalue in the future.."

Austin took his salary in mana as an often referred to incentive for him to want mana to become valuable, presumably at that rate.

I recall comments like 'we pay 250 in referrals mana per user because we reckon we'd pay about $2.50' likewise in the in person mana auction. I'm not saying it was an explicit contract, but there were norms.

From https://manifoldmarkets.notion.site/Charitable-donation-program-668d55f4ded147cf8cf1282a007fb005

"That being said, we will do everything we can to communicate to our users what our plans are for the future and work with anyone who has participated in our platform with the expectation of being able to donate mana earnings."

"everything we can" is not a couple of weeks notice and lot of hassle.  Am I supposed to trust this organisation in future with my real money?

Well they have a much larger donation than has been spent so there were ways to avoid this abrupt change:


"Manifold for Good has received grants totaling $500k from the Center for Effective Altruism (via the FTX Future Fund) to support our charitable endeavors."

Manifold has donated $200k so far. So there is $300k left. Why not at least, say "we will change the rate at which mana can be donated when we burn through this money" 

(via https://manifoldmarkets.notion.site/Charitable-donation-program-668d55f4ded147cf8cf1282a007fb005 )

Carson:
 

Ppl don't seem to understand that Manifold could literally not exist in a year or 2 if they don't find a product market fit

Carson's response:

There was no implicit contract that 100 mana was worth $1 IMO. This was explicitly not the case given CFTC restrictions?

Carson's response:

weren't donations always flagged to be a temporary thing that may or may not continue to exist? I'm not inclined to search for links but that was my understanding.

seems like they are breaking an explicit contract (by pausing donations on ~a weeks notice)

seems breaking an implicity contract (that 100 mana was worth a dollar) 

But I do think, intuitively, GPT-5-MAIA might e.g. make 'catching AIs red-handed' using methods like in this comment significantly easier/cheaper/more scalable.

Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of things.

I agree that this model might help in performing various input/output experiments to determine what made a model do a given suspicious action.

Nathan and Carson's Manifold discussion.

Threaded discussion

"Mise En Place", "[i]nterviews and kitchen walkthroughs:

Qualifies as tacit knowledge, in that people are showing what they're doing that you seldom have a chance to watch first-hand. Reasonably entertaining, seems like you could learn a bit here.

Caveat: most of the dishes are really high-class/meat/fish etc. that you aren't very likely to ever cook yourself, and knowledge seems difficult to transfer.

  • My current guess is that max good and max bad seem relatively balanced. (Perhaps max bad is 5x more bad/flop than max good in expectation.)
  • There are two different (substantial) sources of value/disvalue: interactions with other civilizations (mostly acausal, maybe also aliens) and what the AI itself terminally values
  • On interactions with other civilizations, I'm relatively optimistic that commitment races and threats don't destroy as much value as acausal trade generates on some general view like "actually going through with threats is a waste of resources". I also think it's very likely relatively easy to avoid precommitment issues via very basic precommitment approaches that seem (IMO) very natural. (Specifically, you can just commit to "once I understand what the right/reasonable precommitment process would have been, I'll act as though this was always the precommitment process I followed, regardless of my current epistemic state." I don't think it's obvious that this works, but I think it probably works fine in practice.)
  • On terminal value, I guess I don't see a strong story for extreme disvalue as opposed to mostly expecting approximately no value with some chance of some value. Part of my view is that just relatively "incidental" disvalue (like the sort you link to Daniel Kokotajlo discussing) is likely way less bad/flop than maximum good/flop.

Yes my point is the low T did it before the transition

Ryo 4h10

I can't be certain of the solidity of this uncertainty, and think we still have to be careful, but overall, the most parsimonious prediction to me seems to be super-coordination.
 

Compared to the risk of facing a revengeful super-cooperative alliance, is the price of maintaining humans in a small blooming "island", really that high?

Many other-than-human atoms are lions' prey.

And a doubtful AI may not optimize fully for super-cooperation, simply alleviating the price to pay in the counterfactuals where they encounter a super-cooperative cluster (resulting in a non apocalyptic yet non utopian scenario for us).

I'm aware it looks like a desperate search for each possible hopeful solution but I came to these conclusions by weighting diverse good-and/or-bad-for-us outcomes. I don't want to ignore those evidences under the pretext that it looks naive. 

It's not a mere belief about aliens, it's not about being nice, it's plain logic
 


Also:

We may hardcode a prior of deep likelihood to meet stronger agents
(Or even to “act as if observed by a stronger agent”)

{causal power of known agents} < {causal power of unknown future agents}
+
unknown agents will become known agents > unknown agents stay unknown

So coding a sense that: 
“Stronger allies/ennemies with stronger causal power will certainly be encountered”

Ryo 4h10

Indeed, I am insisting in the three posts that from our perspective, this is the crucial point: 
Fermi's paradox.

Now there is a whole ecosystem of concepts surrounding it, and although I have certain preferred models, the point is that uncertainty is really heavy.


Those AI-lions are cosmical lions thinking on cosmical scales.

Is it easy to detect an AI-Dragon you may meet in millions/billions of years?

Is it undecidable? Probably. For many reasons*


Is this [astronomical level of uncertainty/undecidability + the maximal threat of a death sentence] worth the gamble?

-> "Meeting a stronger AI" = "death"

-> Maximization = 0

-> AI only needs 1 stronger AI to be dead.


 

What is the likelihood for a human-made AI to not encounter [a stronger alien AI], during the whole length of their lifetime?


*(reachable but rare and far in space-time Dragons, but also cases where Dragons are everywhere and so advanced that lower technological proficiency isn't enough etc.).

Rohin Shah4hΩ220

Sounds plausible, but why does this differentially impact the generalizing algorithm over the memorizing algorithm?

Perhaps under normal circumstances both are learned so fast that you just don't notice that one is slower than the other, and this slows both of them down enough that you can see the difference?

gwern4h72

Hence the advice to lost children to not accept random strangers soliciting them spontaneously, but if no authority figure is available, to pick a random adult and ask them for help.

May I strongly recommend that you try to become a Dark Lord instead? 

I mean, literally. Stage some small bloody civil war with expected body count of several millions, become dictator, provide everyone free insurance coverage for cryonics, it will be sure more ethical than 10% of chance of killing literally everyone from the perspective of most of ethical systems I know.

jchan4h129

In my experience, Americans are actually eager to talk to strangers and make friends with them if and only if they have some good reason to be where they are and talk to those people besides making friends with people.

A corollary of this is that if anyone at an [X] gathering is asked “So, what got you into [X]?” and answers “I heard there’s a great community around [X]”, then that person needs to be given the cold shoulder and made to feel unwelcome, because otherwise the bubble of deniability is pierced and the lemon spiral will set in, ruining it for everyone else.

However, this is pretty harsh, and I’m not confident enough in this chain of reasoning to actually “gatekeep” people like this in practice. Does this ring true to you?

I'm generally not a fan of increasing the amount of illegible selection effects.

On the privacy side, can lesswrong guarantee that, if I never click on Recommended, then recombee will never see an (even anonymized) trace of what I browse on lesswrong?

I would say that if a concept is imprecise, more words [but good and precise words] have to be dedicated to faithfully representing the diffuse nature of the topic. If this larger faithful representation is compressed down to fewer words, that can lead to vague phrasing. I would therefore often view vauge phrasing as a compression artefact, rather than a necessary outcome of translating certain types of concepts to words. 

That doesn't seem like "consistently and catastrophically," it seems like "far too often, but with thankfully fairly limited local consequences."

Big +1 to that. Part of why I support (some kinds of) AI regulation is that I think they'll reduce the risk of totalitarianism, not increase it.

So, it sounds like you'd be in favor of a 1-year pause or slowdown then, but not a 10-year?

(Also, I object to your side-swipe at longtermism. Longtermism according to wikipedia is "Longtermism is the ethical view that positively influencing the long-term future is a key moral priority of our time." "A key moral priority" doesn't mean "the only thing that has substantial moral value." If you had instead dunked on classic utilitarianism, I would have agreed.)

Quinn5h10

sure -- i agree that's why i said "something adjacent to" because it had enough overlap in properties. I think my comment completely stands with a different word choice, I'm just not sure what word choice would do a better job.

I mean to some extent, Dawkins isn't a historian of science, presentism, yadda yadda but from what I've seen he's right here. Not that Wallace is somehow worse, given that of all the people out there he was certainly closer than the rest. That's about it

I would highly recommend getting someone else to debug your subconscious for you.  At least it worked for me.  I don’t think it would be possible for me to have debugged myself.
 

My first therapist was highly directive.  He’d say stuff like “Try noticing when you think X, and asking yourself what happened immediately before that.  Report back next week.” And listing agenda items and drawing diagrams on a whiteboard.  As an engineer, I loved it.  My second therapist was more in the “providing supportive comments while I talk about my life” school.  I don’t think that helped much, at least subjectively from the inside.

Here‘s a possibly instructive anecdote about my first therapist.  Near the end of a session, I feel like my mind has been stretched in some heretofore-unknown direction.  It’s a sensation I’ve never had before.  So I say, “Wow, my mind feels like it’s been stretched in some heretofore-unknown direction.  How do you do that?”  He says, “Do you want me to explain?”  And I say, “Does it still work if I know what you’re doing?”  And he says, “Possibly not, but it’s important you feel I’m trustworthy, so I’ll explain if you want.”  So I say “Why mess with success?  Keep doing the thing. I trust you.”  That’s an example of a debugging procedure you can’t do to yourself.

niplav6h20

The obsessive autists who have spent 10,000 hours researching the topic and writing boring articles in support of the mainstream position are left ignored.

It seems like you're spanning up three different categories of thinkers: Academics, public intellectuals, and "obsessive autists".

Notice that the examples you give overlap in those categories: Hanson and Caplan are academics (professors!), while the Natália Mendonça is not an academic, but is approaching being a public intellectual by now(?). Similarly, Scott Alexander strikes me as being in the "public intellectual" bucket much more than any other bucket.

So your conclusion, as far as I read the article, should be "read obsessive autists" instead of "read obsessive autists that support the mainstream view". This is my current best guess—"obsessive autists" are usually not under much strong pressure to say politically palatable things, very unlike professors.

TAG6h20

The other problem is that MWI is up against various subjective and non-realist interpretations, so it's not it's not the case that you can build an ontological model of every interpretation.

Counterfactual means, that if something would not have happened something else would have happened. It's a key concept in Judea Pearl's work on causality. 

dirk6h105

Sometimes a vague phrasing is not an inaccurate demarkation of a more precise concept, but an accurate demarkation of an imprecise concept

True. But for that you need there to exist another mind almost identical to yours except for that one thing. 

In the question "how much of my memories can I delete while retaining my thread of subjective experience?" I don't expect there to be an objective answer. 

I know a child who often has this reaction to negative consequences, natural or imposed. I'd welcome discussion on what works well for that mindset. I don't have any insight, it's not how my mind works.

It seems like very very small consequences can help a bit. Also trying to address the anxiety with OTC supplements like Magnesium Glycinate and lavender oil.

My current main cruxes:

  1. Will AI get takeover capability? When?
  2. Single ASI or many AGIs?
  3. Will we solve technical alignment?
  4. Value alignment, intent alignment, or CEV?
  5. Defense>offense or offense>defense?
  6. Is a long-term pause achievable?

If there is reasonable consensus on any one of those, I'd much appreciate to know about it. Else, I think these should be research priorities.

Counter point: We would likely guess that the graph of rent to income would look similar. 

This is actually corrected on the Epoch website but not here (https://epochai.org/blog/the-longest-training-run)

The reason why EY&co were relatively optimistic (p(doom) ~ 50%) before AlphaGo was their assumption "to build intelligence, you need some kind of insight in theory of intelligence". They didn't expect that you can just take sufficiently large approximator, pour data inside, get intelligent behavior and have no idea about why you get intelligent behavior.

UPDATE: we've corrected equations 9 and 10 in the paper (screenshot of the draft below) and also added a footnote that hopefully helps clarify the derivation. I've also attached a revised figure 6, showing that this doesn't change the overall story (for the mathematical reasons I mentioned in my previous comment). These will go up on arXiv, along with some other minor changes (like remembering to mention SAEs' widths), likely some point next week. Thanks again Sam for pointing this out!

Updated equations (draft):

Updated figure 6 (shrinkage comparison for GELU-1L):

In some of his books on evolution, Dawkins also said very similar things when commenting on Darwin vs Wallace, basically saying that there's no comparison, Darwin had a better grasp of things, justified it better and more extensively, didn't have muddled thinking about mechanisms, etc.

List sorting does not play well with few-shot mostly doesn't replicate with davinci-002.

When using length-10 lists (it crushes length-5 no matter the prompt), I get:

  • 32-shot, no fancy prompt: ~25%
  • 0-shot, fancy python prompt: ~60% 
  • 0-shot, no fancy prompt: ~60%

So few-shot hurts, but the fancy prompt does not seem to help. Code here.

I'm interested if anyone knows another case where a fancy prompt increases performance more than few-shot prompting, where a fancy prompt is a prompt that does not contain information that a human would use to solve the task. This is because I'm looking for counterexamples to the following conjecture: "fine-tuning on k examples beats fancy prompting, even when fancy prompting beats k-shot prompting" (for a reasonable value of k, e.g. the number of examples it would take a human to understand what is going on).

We could also combine this with the rate of growth of investments. In that case we would end up with a total rate of growth of effective compute equal to . This results in an optimal training run length of  years, ie  months.

 

Why is g_I here 3.84, while above it is 1.03?

dirk8h30

I'm against intuitive terminology [epistemic status: 60%] because it creates the illusion of transparency; opaque terms make it clear you're missing something, but if you already have an intuitive definition that differs from the author's it's easy to substitute yours in without realizing you've misunderstood.

dirk8h30

I'm not alexithymic; I directly experience my emotions and have, additionally, introspective access to my preferences. However, some things manifest directly as preferences which I have been shocked to realize in my old age, were in fact emotions all along. (In rare cases these are stronger than the ones directly-felt even, despite reliably seeming on initial inspection to be simply neutral metadata).

That's right. We initially thought it might be important so that the LLM "understood" the task better, but it didn't matter much in the end. The main hyperparameters for our experiments are in train_ray.py, where you can see that we use a "token_loss_weight" of 0.

(Feel free to ask more questions!)

What makes you believe that Substack is to blame and not him unpublishing it?

Self-playing Adversarial Language Game Enhances LLM Reasoning

https://arxiv.org/abs/2404.10642

He explicitly says that the people who argue that there's no gap are mistaken to argue that. He argues for the gap being small, not nonexistent. He does not use the term "near zero" himself. 

LLMs now can also self-play in adversarial word games and it increases their performance https://arxiv.org/abs/2404.10642 

dirk8h10

"Or", in casual conversation, is typically interpreted and meant as being, implicitly, exclusive (this is whence the 'and/or' construction). It's not how "or" is used in formal logic that they would misunderstand, but rather, whether you meant it in the formal-logic sense.

American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline: https://dailynous.com/2024/04/25/apa-creates-new-prizes-for-philosophical-research-on-ai/

https://www.apaonline.org/page/ai2050

https://ai2050.schmidtsciences.org/hard-problems/

dr_s8h20

Well, it's hard to tell because most other civilizations at the required level of wealth to discover this (by which I mean both sailing and surplus enough to have people who worry about the shape of the Earth at all) could one way or another have learned it via osmosis from Greece. If you only have essentially two examples, how do you tell whether it was the one who discovered it who was unusually observant rather than the one who didn't who was unusually blind? But it's an interesting question, it might indeed be a relatively accidental thing which for some reason was accepted sooner than you would have expected (after all, sails disappearing could be explained by an Earth that's merely dome-shaped; the strongest evidence for a completely spherical shape was probably the fact that lunar eclipses feature always a perfect disc shaped shadow, and even that requires interpreting eclipses correctly, and having enough of them in the first place).

dirk8h10

Meta/object level is one possible mixup but it doesn't need to be that. Alternative example, is/ought: Cedar objects to thing Y. Dusk explains that it happens because Z. Cedar reiterates that it shouldn't happen, Dusk clarifies that in fact it is the natural outcome of Z, and we're off once more.

niplav9h50

My best guess is that people in these categories were ones that were high in some other trait, e.g. patience, which allowed them to collect datasets or make careful experiments for quite a while, thus enabling others to make great discoveries.

I'm thinking for example of Tycho Brahe, who is best known for 15 years of careful astronomical observation & data collection, or Gregor Mendel's 7-year-long experiments on peas. Same for Dmitry Belayev and fox domestication. Of course I don't know their cognitive scores, but those don't seem like a bottleneck in their work.

So the recipe to me looks like "find an unexplored data source that requires long-term observation to bear fruit, but would yield a lot of insight if studied closely, then investigate".

dirk9h10

Classic type of argument-gone-wrong (also IMO a way autistic 'hyperliteralism' or 'over-concreteness' can look in practice, though I expect that isn't always what's behind it): Ashton makes a meta-level point X based on Birch's meta point Y about object-level subject matter Z. Ashton thinks the topic of conversation is Y and Z is only relevant as the jumping-off point that sparked it, while Birch wanted to discuss Z and sees X as only relevant insofar as it pertains to Z. Birch explains that X is incorrect with respect to Z; Ashton, frustrated, reiterates that Y is incorrect with respect to X. This can proceed for quite some time with each feeling as though the other has dragged a sensible discussion onto their irrelevant pet issue; Ashton sees Birch's continual returns to Z as a gotcha distracting from the meta-level topic XY, whilst Birch in turn sees Ashton's focus on the meta-level point as sophistry to avoid addressing the object-level topic YZ. It feels almost exactly the same to be on either side of this, so misunderstandings like this are difficult to detect or resolve while involved in one.

Perhaps half of the value of misaligned AI control is from acausal trade and half from the AI itself being valuable.

Why do you think these values are positive? I've been pointing out, and I see that Daniel Kokotajlo also pointed out in 2018 that these values could well be negative. I'm very uncertain but my own best guess is that the expected value of misaligned AI controlling the universe is negative, in part because I put some weight on suffering-focused ethics.

LawrenceC9hΩ220

My speculation for Omni-Grok in particular is that in settings like MNIST you already have two of the ingredients for grokking (that there are both memorising and generalising solutions, and that the generalising solution is more efficient), and then having large parameter norms at initialisation provides the third ingredient (generalising solutions are learned more slowly), for some reason I still don't know.

Higher weight norm means lower effective learning rate with Adam, no? In that paper they used a constant learning rate across weight norms, but Adam tries to normalize the gradients to be of size 1 per paramter, regardless of the size of the weights. So the weights change more slowly with larger initializations (especially since they constrain the weights to be of fixed norm by projecting after the Adam step). 

If you use ublock (or adblock, or adguard, or anything else that uses EasyList syntax), you can add a custom rule

lesswrong.com##.NamesAttachedReactionsCommentBottom-footerReactionsRow
lesswrong.com##.InlineReactHoverableHighlight-highlight:remove-class(InlineReactHoverableHighlight-highlight)

which will remove the reaction section underneath comments and the highlights corresponding to those reactions.

The former of these you can also do through the element picker.

"[...] This is because there would be no general direction towards a truth-based belief domain or away from using human modeling in output generation."

What do you mean by "human modeling in output generation"?

zeshen10h10

I agree with RL agents being misaligned by default, even more so for the non-imitation-learned ones. I mean, even LLMs trained on human-generated data are misaligned by default, regardless of what definition of 'alignment' is being used. But even with misalignment by default, I'm just less convinced that their capabilities would grow fast enough to be able to cause an existential catastrophe in the near-term, if we use LLM capability improvement trends as a reference. 

Bowler's comment on Wallace is that his theory was not worked out to the extent that Darwin's was, and besides I recall that he was a theistic evolutionist. Even with Wallace, there was still a plethora of non-Darwinian evolutionary theories before and after Darwin, and without the force of Darwin's version, it's not likely or necessary that Darwinism wins out. 

 

But Wallace’s version of the theory was not the same as Darwin’s, and he had very different ideas about its implications. And since Wallace conceived his theory in 1858, any equivalent to Darwin’s 1859 Origin of Species would have appeared years later.

Also 

Natural selection, however, was by no means an inevitable expression of mid-nineteenth-century thought, and Darwin was unique in having just the right combination of interests to appreciate all of its key components. No one else, certainly not Wallace, could have articulated the idea in the same way and promoted it to the world so effectively.

And he points out that minus Darwin, nobody would have paid as much attention to Wallace. 

The powerful case for transmutation mounted in the Origin of Species prompted everyone to take the subject seriously and begin to think more constructively about how the process might work. Without the Origin, few would have paid much attention to Wallace’s ideas (which were in many respects much less radical than Darwin’s anyway). Evolutionism would have developed more gradually in the course of the 1860s and ’70s, with Lamarckism being explored as the best available explanation of adaptive evolution. Theories in which adaptation was not seen as central to the evolutionary process would have sustained an evolutionary program that did not enquire so deeply into the actual mechanism of change, concentrating instead on reconstructing the overall history of life on earth from fossil and other evidence. Only toward the end of the century, when interest began to focus on the topic of heredity (largely as a result of social concerns), would the fragility of the non-Darwinian ideas be exposed, paving the way for the selection theory to emerge at last.

Bowler also points out that Wallace didn't really form the connection between both natural and artificial selection. 

Wei Dai10h197

If something is both a vanguard and limited, then it seemingly can't stay a vanguard for long. I see a few different scenarios going forward:

  1. We pause AI development while LLMs are still the vanguard.
  2. The data limitation is overcome with something like IDA or Debate.
  3. LLMs are overtaken by another AI technology, perhaps based on RL.

In terms of relative safety, it's probably 1 > 2 > 3. Given that 2 might not happen in time, might not be safe if it does, or might still be ultimately outcompeted by something else like RL, I'm not getting very optimistic about AI safety just yet.

Load More