Gear-level models are expensive - often prohibitively expensive. Black-box approaches are usually much cheaper and faster. But black-box approaches rarely generalize - they're subject to Goodhart, need to be rebuilt when conditions change, don't identify unknown unknowns, and are hard to build on top of. Gears-level models, on the other hand, offer permanent, generalizable knowledge which can be applied to many problems in the future, even if conditions shift.

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
Very Spicy Take Epistemic Note:  Many highly respected community members with substantially greater decision making experience (and Lesswrong karma) presumably disagree strongly with my conclusion. Premise 1:  It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research. Premise 2: This was the default outcome.  Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.  Premise 3: Without repercussions for terrible decisions, decision makers have no skin in the game.  Conclusion: Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn't be allowed anywhere near AI Safety decision making in the future. To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.  This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.  To quote OpenPhil: "OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela."
If your endgame strategy involved relying on OpenAI, DeepMind, or Anthropic to implement your alignment solution that solves science / super-cooperation / nanotechnology, consider figuring out another endgame plan.
A Theory of Usable Information Under Computational Constraints > We propose a new framework for reasoning about information in complex systems. Our foundation is based on a variational extension of Shannon's information theory that takes into account the modeling power and computational constraints of the observer. The resulting \emph{predictive V-information} encompasses mutual information and other notions of informativeness such as the coefficient of determination. Unlike Shannon's mutual information and in violation of the data processing inequality, V-information can be created through computation. This is consistent with deep neural networks extracting hierarchies of progressively more informative features in representation learning. Additionally, we show that by incorporating computational constraints, V-information can be reliably estimated from data even in high dimensions with PAC-style guarantees. Empirically, we demonstrate predictive V-information is more effective than mutual information for structure learning and fair representation learning. h/t Simon Pepin Lehalleur
Several dozen people now presumably have Lumina in their mouths. Can we not simply crowdsource some assays of their saliva? I would chip money in to this. Key questions around ethanol levels, aldehyde levels, antibacterial levels, and whether the organism itself stays colonized at useful levels.
davekasten4412
1
Epistemic status: not a lawyer, but I've worked with a lot of them. As I understand it, an NDA isn't enforceable against a subpoena (though the former employer can seek a protective order for the testimony).   Someone should really encourage law enforcement or Congress to subpoena the OpenAI resigners...

Popular Comments

Recent Discussion

Epistemic status: the stories here are all as true as possible from memory, but my memory is so so.

An AI made this

This is going to be big

It’s late Summer 2017. I am on a walk in the Mendip Hills. It’s warm and sunny and the air feels fresh. With me are around 20 other people from the Effective Altruism London community. We’ve travelled west for a retreat to discuss how to help others more effectively with our donations and careers. As we cross cow field after cow field, I get talking to one of the people from the group I don’t know yet. He seems smart, and cheerful. He tells me that he is an AI researcher at Google DeepMind. He explains how he is thinking about...

Sure, he's trying to cause alarm via alleged excerpts from his life. Surely society should have some way to move to a state of alarm iff that's appropriate, do you see a better protocol than this one?

I have liked music very much since I was a teenager. I spent many hours late at night in Soulseek chat rooms talking about and sharing music with my online friends. So, I tend to just have some music floating around in my head on any given day. But, I never learned to play any instrument, or use any digital audio software. It just didn't catch my interest.

My wife learned to play piano as a kid, so we happen to have a keyboard sitting around in our apartment. One day I was bored so I decided to just see whether I could figure out how to play some random song that I was thinking about right then. I found I was easily able to reconstitute a piano...

2Algon
Why do you think DHA algea powder works?
1keltan
Only bc I’m vegan. If I wasn’t, I wouldn’t be supplementing it. I wish I could say I had a more accurate model. But my understanding doesn’t go deeper than DHA = Myelin = Faster processing Was this purely a question? Or is there something I should look into here?
Algon30

No, it's just that my prior says nootropics almost never work so I was wondering if you had some data suggesting this did e.g. by dowing a RCT on yourself or using signal processing techniques to detect if supplementing this stuff lead to a causal change in reflex times or so forth.

EDIT: Though I am vegan and I'm really ignorant about what makes for a good diet. So I'd be curious to hear why it's helpful for vegans to take this stuff.

26Stephen Fowler
Very Spicy Take Epistemic Note:  Many highly respected community members with substantially greater decision making experience (and Lesswrong karma) presumably disagree strongly with my conclusion. Premise 1:  It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research. Premise 2: This was the default outcome.  Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.  Premise 3: Without repercussions for terrible decisions, decision makers have no skin in the game.  Conclusion: Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn't be allowed anywhere near AI Safety decision making in the future. To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.  This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.  To quote OpenPhil: "OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela."
1keltan
I’d like to see people who are more informed than I am have a conversation about this. Maybe at Less.online? https://www.lesswrong.com/posts/zAqqeXcau9y2yiJdi/can-we-build-a-better-public-doublecrux
12mesaoptimizer
I just realized that Paul Christiano and Dario Amodei both probably have signed non-disclosure + non-disparagement contracts since they both left OpenAI. That impacts how I'd interpret Paul's (and Dario's) claims and opinions (or the lack thereof), that relates to OpenAI or alignment proposals entangled with what OpenAI is doing. If Paul has systematically silenced himself, and a large amount of OpenPhil and SFF money has been mis-allocated because of systematically skewed beliefs that these organizations have had due to Paul's opinions or lack thereof, well. I don't think this is the case though -- I expect Paul, Dario, and Holden all seem to have converged on similar beliefs (whether they track reality or not) and have taken actions consistent with those beliefs.
bideup10

Can anybody confirm whether Paul is likely systematically silenced re OpenAI?

Produced as part of the MATS Winter 2023-4 program, under the mentorship of @Jessica Rumbelow

One-sentence summary: On a dataset of human-written essays, we find that gpt-3.5-turbo can accurately infer demographic information about the authors from just the essay text, and suspect it's inferring much more.


Introduction

Every time we sit down in front of an LLM like GPT-4, it starts with a blank slate. It knows nothing[1] about who we are, other than what it knows about users in general. But with every word we type, we reveal more about ourselves -- our beliefs, our personality, our education level, even our gender. Just how clearly does the model see us by the end of the conversation, and why should that worry us?

Like many, we were rather startled when @janus showed...

1Martin Vlach
To work around the non-top-n you can supply logit_bias list to the API.
1Martin Vlach
As the Llama3 70B base model is said very clean( unlike base DeepSeek for example, which is instruction-spoiled already) and similarly capable to GPT3.5, you could explore that hypothesis.   Details: Check Groq or TogetherAI for free inference, not sure if test data would fit Llama3 context window.
28e9
It's now possible to get mostly deterministic outputs if you set the seed parameter to an integer of your choice, the other parameters are identical, and the model hasn't been updated.

Summary:

We think a lot about aligning AGI with human values. I think it’s more likely that we’ll try to make the first AGIs do something else. This might intuitively be described as trying to make instruction-following (IF) or do-what-I-mean-and-check (DWIMAC) be the central goal of the AGI we design. Adopting this goal target seems to improve the odds of success of any technical alignment approach. This goal target avoids the hard problem of specifying human values in an adequately precise and stable way, and substantially helps with goal misspecification and deception by allowing one to treat the AGI as a collaborator in keeping it aligned as it becomes smarter and takes on more complex tasks.

This is similar but distinct from the goal targets of prosaic alignment efforts....

1wassname
When you rephrase this to be about search engines It doesn't describe reality. Most of us consume search and recommendations that has been censored (e.g. removing porn, piracy, toxicity, racism, taboo politics) in a way that pus cultural values over our preferences or interests. So perhaps it won't be true for AI either. At least in the near term, the line between AI and search is a blurred line, and the same pressures exist on consumers and providers.

I also expect AIs to be constrained by social norms, laws, and societal values. But I think there's a distinction between how AIs will be constrained and how AIs will try to help humans. Although it often censors certain topics, Google still usually delivers the results the user wants, rather than serving some broader social agenda upon each query. Likewise, ChatGPT is constrained by social mores, but it's still better described as a user assistant, not as an engine for social change or as a benevolent agent that acts on behalf of humanity.

4Seth Herd
Thanks for engaging. I did read your linked post. I think you're actually in the majority in your opinion on AI leading to a continuation and expansion of business as usual. I've long been curious about about this line of thinking; while it makes a good bit of sense to me for the near future, I become confused at the "indefinite" part of your prediction. When you say that AI continues from the first step indefinitely, it seems to me that you must believe one or more of the following: * No one would ever tell their arbitrarily powerful AI to take over the world * Even if it might succeed * No arbitrarily powerful AI could succeed at taking over the world * Even if it was willing to do terrible damage in the process * We'll have a limited number of humans controlling arbitrarily powerful AI * And an indefinitely stable balance-of-power agreement among them * By "indefinitely" you mean only until we create and proliferate really powerful AI If I believed in any of those, I'd agree with you.  Or perhaps I'm missing some other belief we don't share that leads to your conclusions. Care to share?   Separately, in response to that post: your post you linked was titled AI values will be shaped by a variety of forces, not just the values of AI developers. In my prediction here, AI and AGI will not have values in any important sense; it will merely carry out the values of its principals (its creators, or the government that shows up to take control). This might just be terminological distinction, except for the following bit of implied logic: I don't think AI needs to share clients' values to be of immense economic and practical advantage to them. When (if) someone creates a highly capable AI system, they will instruct it to serve customers needs in certain ways, including following their requests within certain limits; this will not necessitate changing the A(G)I's core values (if they exist) to use it to make enormous profits when licensed to clients. To
5Matthew Barnett
This is closest to what I am saying. The current world appears to be in a state of inter-agent competition. Even as technology has gotten more advanced, and as agents have gotten powerful over time, no single unified agent has been able to obtain control over everything and win the entire pie, defeating all the other agents. I think we should expect this state of affairs to continue even as AGI gets invented and technology continues to get more powerful. (One plausible exception to the idea that "no single agent has ever won the competition over the world" is the human species itself, which dominates over other animal species. But I don't think the human species is well-described as a unified agent, and I think our power comes mostly from accumulated technological abilities, rather than raw intelligence by itself. This distinction is important because the effects of technological innovation generally diffuse across society rather than giving highly concentrated powers to the people who invent stuff. This generally makes the situation with humans vs. animals disanalogous to a hypothetical AGI foom in several important ways.) Separately, I also think that even if an AGI agent could violently take over the world, it would likely not be rational for it to try, due to the fact that compromising with the rest of the world would be a less risky and more efficient way of achieving its goals. I've written about these ideas in a shortform thread here.

Originally published at https://nonzerosum.games. Come visit for the full experience.

From the genetic lottery we’re thrown into at birth, to the educational opportunities we navigate, the jobs we compete for, and the relationships we cultivate — every stage in life is marked by wins and losses, strategies and tactics, alliances, and competition. But not all games are zero-sum showdowns. To understand non-zero-sumness it helps to consider “sum” different types of games.

  • zero-sum games
  • positive-sum games
  • negative-sum games
  • meta-games

ZERO-SUM GAMES

… are where two parties compete and in order for one party to win the other must lose. The positive payoff for the winner in such a game requires an equally negative payoff for the loser, so that the sum of the payoff is zero, hence “zero-sum”. Chess is as zero-sum game because players...

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
2Lorxus
Surely so! Hit me up if you ever end doing this - I'm likely getting the Lumina treatment in a couple months.

A before and after would be even better!

9kave
I think Romoeo is thinking of checking a bunch of mediators of risk (like aldehyde levels) as well as of function (like whether the organism stays colonised)

Ilya Sutskever and Jan Leike have resigned. They led OpenAI's alignment work. Superalignment will now be led by John Schulman, it seems. Jakub Pachocki replaced Sutskever as Chief Scientist.

Reasons are unclear (as usual when safety people leave OpenAI).

The NYT piece and others I've seen don't really have details. Archive of NYT if you want to read it anyway.

OpenAI announced Sutskever's departure in a blogpost.

Sutskever and Leike confirmed their departures in tweets.


Updates (Friday May 17):

Superalignment dissolves.

Leike tweets, including:

I have been disagreeing with OpenAI leadership about the company's core priorities for quite some time, until we finally reached a breaking point.

I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact,

...

Organizational structure is an alignment mechanism. 

While I sympathize with the stated intentions, I just can't wrap my head around the naivety.  OpenAI corporate structure was a recipe for bad corporate governance. "We are the good guys here, the structure is needed to make others align with us."- an organization where ethical people can rule as benevolent dictators is the same mistake committed socialists made when they had power.  

If it was that easy, AI alignment would be solved by creating ethical AI commit... (read more)

11Benjamin Sturgeon
Actually, as far as I know, this is wrong. He simply hasn’t been back to the offices but has been working remotely. https://www.vox.com/future-perfect/2024/5/17/24158403/openai-resignations-ai-safety-ilya-sutskever-jan-leike-artificial-intelligence This article goes into some detail and seems quite good.
16mike_hawke
Thanks for the source. I've intentionally made it difficult for myself to log into twitter. For the benefit of others who avoid Twitter, here is the text of Kelsey's tweet thread:
4wassname
Thanks, but this doesn't really give insight on whether this is normal or enforceable. So I wanted to point out, we don't know if it's enforcible, and have not seen a single legal opinion.

Update: further research has led me to believe that people of all races should test themselves for ALDH deficiency before using Lumina. Even if you don't exhibit AFR symptoms when drinking alcohol, your ALDH activity may still be decreased.

Many people in the rational sphere have been promoting Lumina/BCS3-L1, a genetically engineered bacterium, as an anti-cavity treatment. However, none have brought up a major negative interaction that may occur with a common genetic mutation.

In short, the treatment works by replacing lactic acid generating bacteria in the mouth with ones that instead convert sugars to ethanol, among other changes. Scott Alexander made a pretty good FAQ about this. Lactic acid results in cavities and teeth demineralization, while ethanol does not. I think this is a really cool idea, and...

EGI10

What you are missing here is that S. mutants often lives in pockets between tooth an epithelium or between teeth with direct permanent contact to epithelium. Due to the geometry of these spaces access to saliva is very poor so metabolites can enrich to levels way beyond those you suggest here.

This mechanism is also a big problem with the pH study above.

LessOnline Festival

May 31st to June 2nd, Berkely CA