A book review examining Elinor Ostrom's "Governance of the Commons", in light of Eliezer Yudkowsky's "Inadequate Equilibria." Are successful local institutions for governing common pool resources possible without government intervention? Under what circumstances can such institutions emerge spontaneously to solve coordination problems?

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
invalid input syntax for type boolean: "What is AI alignment?"
robo8323
13
Our current big stupid: not preparing for 40% agreement Epistemic status: lukewarm take from the gut (not brain) that feels rightish The "Big Stupid" of the AI doomers 2013-2023 was AI nerds' solution to the problem "How do we stop people from building dangerous AIs?" was "research how to build AIs".  Methods normal people would consider to stop people from building dangerous AIs, like asking governments to make it illegal to build dangerous AIs, were considered gauche.  When the public turned out to be somewhat receptive to the idea of regulating AIs, doomers were unprepared. Take: The "Big Stupid" of right now is still the same thing.  (We've not corrected enough).  Between now and transformative AGI we are likely to encounter a moment where 40% of people realize AIs really could take over (say if every month another 1% of the population loses their job).  If 40% of the world were as scared of AI loss-of-control as you, what could the world do? I think a lot!  Do we have a plan for then? Almost every LessWrong post on AIs are about analyzing AIs.  Almost none are about how, given widespread public support, people/governments could stop bad AIs from being built. [Example: if 40% of people were as worried about AI as I was, the US would treat GPU manufacture like uranium enrichment.  And fortunately GPU manufacture is hundreds of time harder than uranium enrichment!  We should be nerding out researching integrated circuit supply chains, choke points, foundry logistics in jurisdictions the US can't unilaterally sanction, that sort of thing.] TLDR, stopping deadly AIs from being built needs less research on AIs and more research on how to stop AIs from being built. *My research included 😬
Very Spicy Take Epistemic Note:  Many highly respected community members with substantially greater decision making experience (and Lesswrong karma) presumably disagree strongly with my conclusion. Premise 1:  It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research. Premise 2: This was the default outcome.  Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.  Premise 3: Without repercussions for terrible decisions, decision makers have no skin in the game.  Conclusion: Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn't be allowed anywhere near AI Safety decision making in the future. To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.  This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.  To quote OpenPhil: "OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela."
Akash4032
2
My current perspective is that criticism of AGI labs is an under-incentivized public good. I suspect there's a disproportionate amount of value that people could have by evaluating lab plans, publicly criticizing labs when they break commitments or make poor arguments, talking to journalists/policymakers about their concerns, etc. Some quick thoughts: * Soft power– I think people underestimate the how strong the "soft power" of labs is, particularly in the Bay Area.  * Jobs– A large fraction of people getting involved in AI safety are interested in the potential of working for a lab one day. There are some obvious reasons for this– lots of potential impact from being at the organizations literally building AGI, big salaries, lots of prestige, etc. * People (IMO correctly) perceive that if they acquire a reputation for being critical of labs, their plans, or their leadership, they will essentially sacrifice the ability to work at the labs.  * So you get an equilibrium where the only people making (strong) criticisms of labs are those who have essentially chosen to forgo their potential of working there. * Money– The labs and Open Phil (which has been perceived, IMO correctly, as investing primarily into metastrategies that are aligned with lab interests) have an incredibly large share of the $$$ in the space. When funding became more limited, this became even more true, and I noticed a very tangible shift in the culture & discourse around labs + Open Phil * Status games//reputation– Groups who were more inclined to criticize labs and advocate for public or policymaker outreach were branded as “unilateralist”, “not serious”, and “untrustworthy” in core EA circles. In many cases, there were genuine doubts about these groups, but my impression is that these doubts got amplified/weaponized in cases where the groups were more openly critical of the labs. * Subjectivity of "good judgment"– There is a strong culture of people getting jobs/status for having “good judgment”. This is sensible insofar as we want people with good judgment (who wouldn’t?) but this often ends up being so subjective that it ends up leading to people being quite afraid to voice opinions that go against mainstream views and metastrategies (particularly those endorsed by labs + Open Phil). * Anecdote– Personally, I found my ability to evaluate and critique labs + mainstream metastrategies substantially improved when I spent more time around folks in London and DC (who were less closely tied to the labs). In fairness, I suspect that if I had lived in London or DC *first* and then moved to the Bay Area, it’s plausible I would’ve had a similar feeling but in the “reverse direction”. With all this in mind, I find myself more deeply appreciating folks who have publicly and openly critiqued labs, even in situations where the cultural and economic incentives to do so were quite weak (relative to staying silent or saying generic positive things about labs). Examples: Habryka, Rob Bensinger, CAIS, MIRI, Conjecture, and FLI. More recently, @Zach Stein-Perlman, and of course Jan Leike and Daniel K. 
If your endgame strategy involved relying on OpenAI, DeepMind, or Anthropic to implement your alignment solution that solves science / super-cooperation / nanotechnology, consider figuring out another endgame plan.
I'm surprised at people who seem to be updating only now about OpenAI being very irresponsible, rather than updating when they created a giant public competitive market for chatbots (which contains plenty of labs that don't care about alignment at all), thereby reducing how long everyone has to solve alignment. I still parse that move as devastating the commons in order to make a quick buck.

Popular Comments

Recent Discussion

Epistemic status: very shallow google scholar dive. Intended mostly as trailheads for people to follow up on on their own.

previously: https://www.lesswrong.com/posts/h6kChrecznGD4ikqv/increasing-iq-is-trivial

I don't know to what degree this will wind up being a constraint. But given that many of the things that help in this domain have independent lines of evidence for benefit it seems worth collecting.

Food

dark chocolate, beets, blueberries, fish, eggs. I've had good effects with strong hibiscus and mint tea (both vasodilators).

Exercise

Regular cardio, stretching/yoga, going for daily walks.

Learning

Meditation, math, music, enjoyable hobbies with a learning component.

Light therapy

Unknown effect size, but increasingly cheap to test over the last few years. I was able to get Too Many lumens for under $50. Sun exposure has a larger effect size here, so exercising outside is helpful.

Cold exposure

this might mostly...

Update: I resolved maybe all of my neck tension and vagus nerve tension. I don't know how to tell whether this increased by intelligence though. It's also not like I had headaches or anything obvious like that before

Produced as part of the MATS Winter 2023-4 program, under the mentorship of @Jessica Rumbelow

One-sentence summary: On a dataset of human-written essays, we find that gpt-3.5-turbo can accurately infer demographic information about the authors from just the essay text, and suspect it's inferring much more.


Introduction

Every time we sit down in front of an LLM like GPT-4, it starts with a blank slate. It knows nothing[1] about who we are, other than what it knows about users in general. But with every word we type, we reveal more about ourselves -- our beliefs, our personality, our education level, even our gender. Just how clearly does the model see us by the end of the conversation, and why should that worry us?

Like many, we were rather startled when @janus showed...

If you are using llama you can use https://github.com/wassname/prob_jsonformer, or snippets of the code to get probabilities over a selection of tokens

5gwern
Oh, that seems easy enough. People might think that they are safe as long as they don't write as much as I or Scott do under a few names, but that's not true. If you have any writing samples at all, you just stick the list of them into a prompt and ask about similarity. Even if you have a lot of writing, context windows are now millions of tokens long, so you can stick an entire book (or three) of writing into a context window. And remember, the longer the context window, the more that the 'prompt' is simply an inefficient form of pretraining, where you create the hidden state of an RNN for millions of timesteps, meta-learning the new task, and then throw it away. (Although note even there that Google has a new 'caching' feature which lets you run the same prompt multiple times, essentially reinventing caching RNN hidden states.) So when you stick corpuses into a long prompt, you are essentially pretraining the LLM some more, and making it as capable of identifying a new author as it is capable of already identifying 'gwern' or 'Scott Alexander'. So, you would simply do something like put in a list of (author, sample) as well as any additional metadata convenient like biographies, then 'unknown sample', and ask, 'rank the authors by how likely they are to have written that final sample by an unknown author'. This depends on having a short list of authors which can fit in the prompt (the shorter the samples, the more you can fit, but the worse the prediction), but it's not hard to imagine how to generalize this to an entire list. You can think of it as a noisy sorting problem or a best-arm finding problem. Just break up your entire list of n authors into groups of m, and start running the identification prompt, which will not cost n log n prompts because you're not sorting the entire list, you are only finding the min/max (which is roughly linear). For many purposes, it would be acceptable to pay a few dozen dollars to dox an author out of a list of a few thousand
1eggsyntax
Will read this in detail later when I can, but on first skim -- I've seen you draw that conclusion in earlier comments. Are you assuming you yourself will finally be deanonymized soon? No pressure to answer, of course; it's a pretty personal question, and answering might itself give away a bit or two.
4gwern
I can be deanonymized in other ways more easily. I write these as warnings to other people who might think that it is still adequate to simply use a pseudonym and write exclusively in text and not make the obvious OPSEC mistakes, and so you can safely write under multiple names. It is not, because you will have already lost in a few years. Regrettable as it is, if you wish to write anything online which might invite persecution over the next few years or lead activists to try to dox you - if you are, say, blowing a whistle at a sophisticated megacorp company with the most punitive NDAs & equity policies in the industry - you would be well-advised to start laundering your writings through an LLM yesterday, despite the deplorable effects on style. Truesight will only get keener and flense away more of the security by obscurity we so take for granted, because "attacks only get better".

Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alexander, Sarah, and @Guillaume Corlouer for suggestions on this writeup.

Introduction

What computational structure are we building into LLMs when we train them on next-token prediction? In this post we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. We'll explain exactly what this means in the post. We are excited by these results because

  • We have a formalism that relates training data to internal
...

this post seems like a win for PIBBSS gee

If LW takes this route, it should be cognizant of the usual challenges of getting involved in politics. I think there's a very good chance of evaporative cooling, where people trying to see AI clearly gradually leave, and are replaced by activists. The current reaction to OpenAI events is already seeming fairly tribal IMO.

2Akash
Oh good point– I think my original phrasing was too broad. I didn't mean to suggest that there were no high-quality policy discussions on LW, moreso meant to claim that the proportion/frequency of policy content is relatively limited. I've edited to reflect a more precise claim: (I haven't seen much from Scott or Robin about AI policy topics recently– agree that Zvi's posts have been helpful.) (I also don't know of many public places that have good AI policy discussions. I do think the difference in quality between "public discussions" and "private discussions" is quite high in policy. I'm not quite sure what the difference looks like for people who are deep into technical research, but it seems likely to me that policy culture is more private/secretive than technical culture.)
2Thomas Kwa
Seems reasonable except that Eliezer's p(doom | trying to solve alignment) in early 2023 was much higher than 50%, probably more like 98%. AGI Ruin was published in June 2022 and drafts existed since early 2022. MIRI leadership had been pretty pessimistic ever since AlphaGo in 2016 and especially since their research agenda collapsed in 2019.
1quetzal_rainbow
I am talking about belief state in ~2015, because everyone was already skeptical about policy approach at that time.

This is a D&D.Sci scenario: a puzzle where players are given a dataset to analyze and an objective to pursue using information from that dataset.

Duke Arado’s obsession with physics-defying architecture has caused him to run into a small problem. His problem is not – he affirms – that his interest has in any way waned: the menagerie of fantastical buildings which dot his territories attest to this, and he treasures each new time-bending tower or non-Euclidean mansion as much as the first. Nor – he assuages – is it that he’s having trouble finding talent: while it’s true that no individual has ever managed to design more than one impossible structure, it’s also true that he scarcely goes a week without some architect arriving at his door, haunted...

Greg D10

I’m not a data scientist, but I love these. I’ve got a four-hour flight ahead of me and a copy of Microsoft Excel; maybe now is the right time to give one a try!

!It seems like the combination of materials determines the cost of the structure.

!Architects who apprenticed with Johnson or Stamatin always produce impossible buildings; architects who apprenticed with Geisel, Penrose, or Escher NEVER do. Self-taught architects sometimes produce impossible buildings, and sometimes they do not.

!This lets us select 5 designs from our proposals which will ce

... (read more)

[Epistemic status: As I say below, I've been thinking about this topic for several years and I've worked on it as part of my PhD research. But none of this is based on any rigorous methodology, just my own impressions from reading the literature.]

I've been thinking about possible cruxes in AI x-risk debates for several years now. I was even doing that as part of my PhD research, although my PhD is currently on pause because my grant ran out. In particular, I often wonder about "meta-cruxes" - i.e., cruxes related to debates or uncertainties that are more about different epistemological or decision-making approaches rather than about more object-level arguments.

The following are some of my current top candidates for "meta-cruxes" related to AI x-risk debates. There are...

I would add

Conflict theory vs. comparative advantage

Is it possible for the wrong kind of technological development to make things worse, or does anything that increases aggregate productivity always make everyone better off in the long run?

Cosmopolitanism vs. human protectionism

Is it acceptable, or good, to let humans go extinct if they will be replaced by an entity that's more sophisticated or advanced in some way, or should humans defend humanity simply because we're human?

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Thanks to Taylor Smith for doing some copy-editing this.

In this article, I tell some anecdotes and present some evidence in the form of research artifacts about how easy it is for me to work hard when I have collaborators. If you are in a hurry I recommend skipping to the research artifact section.

Bleeding Feet and Dedication

During AI Safety Camp (AISC) 2024, I was working with somebody on how to use binary search to approximate a hull that would contain a set of points, only to knock a glass off of my table. It splintered into a thousand pieces all over my floor.

A normal person might stop and remove all the glass splinters. I just spent 10 seconds picking up some of the largest pieces and then decided...

Holden advised against this:

Jog, don’t sprint. Skeptics of the “most important century” hypothesis will sometimes say things like “If you really believe this, why are you working normal amounts of hours instead of extreme amounts? Why do you have hobbies (or children, etc.) at all?” And I’ve seen a number of people with an attitude like: “THIS IS THE MOST IMPORTANT TIME IN HISTORY. I NEED TO WORK 24/7 AND FORGET ABOUT EVERYTHING ELSE. NO VACATIONS."

I think that’s a very bad idea.

Trying to reduce risks from advanced AI is, as of today, a frustrating and dis

... (read more)
2Algon
I can't see a link to any LW dialog at the top.
1Johannes C. Mayer
At the top of this document.
2Algon
Thanks!

Ilya Sutskever and Jan Leike have resigned. They led OpenAI's alignment work. Superalignment will now be led by John Schulman, it seems. Jakub Pachocki replaced Sutskever as Chief Scientist.

Reasons are unclear (as usual when safety people leave OpenAI).

The NYT piece (archive) and others I've seen don't really have details.

OpenAI announced Sutskever's departure in a blogpost.

Sutskever and Leike confirmed their departures in tweets.


Updates:

Friday May 17:

Superalignment dissolves.

Leike tweets, including:

I have been disagreeing with OpenAI leadership about the company's core priorities for quite some time, until we finally reached a breaking point.

I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics.

These problems are quite hard to get right,

...

It may be that talking about "vested equity" is avoiding some lie that would occur if he made the same claim about the PPUs. If he did mean to include the PPUs as "vested equity" presumably he or a spokesperson could clarify, but I somehow doubt they will.

17johnvon
This interview was terrifying to me (and I think to Dwarkesh as well), Schulman continually demonstrates that he hasn't really thought about the AGI future scenarios in that much depth and sort of handwaves away any talk of future dangers.  Right off the bat he acknowledges that they reasonably expect AGI in 1-5 years or so, and even though Dwarkesh pushes him he doesn't present any more detailed plan for safety than "Oh we'll need to be careful and cooperate with the other companies...I guess..."  
18jacquesthibs
In case people missed this, another safety researcher recently left OpenAI: Ryan Lowe. I don't know Ryan's situation, but he was a "research manager working on AI alignment."

As the dictum goes, “If it helps but doesn’t solve your problem, perhaps you’re not using enough.” But I still find that I’m sometimes not using enough effort, not doing enough of what works, simply put, not using enough dakka. And if reading one post isn’t enough to get me to do something… perhaps there isn’t enough guidance, or examples, or repetition, or maybe me writing it will help reinforce it more. And I hope this post is useful for more than just myself.

Of course, the ideas below are not all useful in any given situation, and many are obvious, at least after they are mentioned, but when you’re trying to get more dakka, it’s probably worth running through the list and considering each one and how it...

Very happy to see a concrete outcome from these suggestions!

LessOnline Festival

May 31st to June 2nd, Berkely CA