Alex Ray

AI Alignment @ OpenAI


Alex Ray's Shortform

Intersubjective Mean and Variability.

(Subtitle: I wish we shared more art with each other)

This is mostly a reaction to the (10y old) LW post:  Things you are supposed to like.

I think there's two common stories for comparing intersubjective experiences:

  • "Mismatch": Alice loves a book, and found it deeply transformative.  Beth, who otherwise has very similar tastes and preferences to Alice, reads the book and finds it boring and unmoving.
  • "Match": Charlie loves a piece of music.  Daniel, who shares a lot of Charlie's taste in music, listens to it and also loves it.

One way I can think of unpacking this is that there is in terms of distributions:

  • "Mean" - the shared intersubjective experiences, which we see in the "Match" case
  • "Variability" - the difference in intersubjective experiences, which we see in the "Mismatch" case

Another way of unpacking this is due to factors within the piece or within the subject

  • "Intrinsic" - factors that are within the subject, things like past experiences and memories and even what you had for breakfast
  • "Extrinsic" - factors that are within the piece itself, and shared by all observers

And one more ingredient I want to point at is question substitution.  In this case I think the effect is more like "felt sense query substitution" or "received answer substitution" since it doesn't have an explicit question.

  • When asked about a piece (of art, music, etc) people will respond with how they felt -- which includes both intrinsic and extrinsic factors.

Anyways what I want is better social tools for separating out these, in ways that let people share their interest and excitement in things.

  • I think that these mismatches/misfirings (like the LW post that set this off) and the reactions to them cause a chilling effect, where the LW/rationality community is not sharing as much art because of this
  • I want to be in a community that's got a bunch of people sharing art they love and cherish

I think great art is underrepresented in LW and want to change that.

Alex Ray's Shortform

How I would do a group-buy of methylation analysis.

(N.B. this is "thinking out loud" and not actually a plan I intend to execute)

Methylation is a pretty commonly discussed epigenetic factor related to aging.  However it might be the case that this is downstream of other longevity factors.

I would like to measure my epigenetics -- in particular approximate rates/locations of methylation within my genome.  This can be used to provide an approximate biological age correlate.

There are different ways to measure methylation, but one I'm pretty excited about that I don't hear mentioned often enough is the Oxford Nanopore sequencer.

The mechanism of the sequencer is that it does direct-reads (instead of reading amplified libraries, which destroy methylation unless specifically treated for it), and off the device is a time-series of electrical signals, which are decoded into base calls with a ML model.  Unsurprisingly, community members have been building their own base caller models, including ones that are specialized to different tasks.

So the community made a bunch of methylation base callers, and they've been found to be pretty good.

So anyways the basic plan is this:

Why I think this is cool?  Mostly because ONT makes a $1k sequencer than can fit in your pocket, and can do well in excess of 1-10Gb reads before needing replacement consumables.  This is mostly me daydreaming what I would want to do with it.

Aside: they also have a pretty cool $9k sample prep tool, which would be useful to me since I'm empirically crappy at doing bio experiments, but the real solution would probably just be to have a contract lab do all the steps and just send the data.

Beijing Academy of Artificial Intelligence announces 1,75 trillion parameters model, Wu Dao 2.0

In my experience, I haven't seen a good "translation" process -- instead models are pretrained on bigger and bigger corpora which include more languages.

GPT-3 was trained on data that was mostly english, but also is able to (AFAICT) generate other languages as well.

For some english-dependent metrics (SuperGLUE, Winogrande, LAMBADA, etc) I expect a model trained on primarily non-english corpora would do worse.

Also, yes, the tokenization I would expect to be different for a largely different corpora.

Teaching ML to answer questions honestly instead of predicting human answers

I feel overall confused, but I think that's mostly because of me missing some relevant background to your thinking, and the preliminary/draft nature of this.

I hope sharing my confusions is useful to you.  Here they are:

I'm not sure how the process of "spending bits" works.  If the space of possible models was finite and discretized, then you could say spending bits is partitioning down to "1/2^B"th of the space -- but this is not at all how SGD works, and seems incompatible with using SGD (or any optimizer that doesn't 'teleport' through parameter space) as the optimization algorithm.

Spending bits does make sense in terms of naive rejection sampling (but I think we agree this would be intractably expensive) and other cases of discrete optimization like integer programming.  It's possible I would be less confused if this was explained using a different optimization algorithm, like BFGS or some hessian-based method or maybe a black-box bayesian solver.

Separately, I'm not sure why the two heads wouldn't just end up being identical to each other.  In shorter-program-length priors (which seem reasonable in this case; also minimal-description-length and sparse-factor-graph, etc etc) it seems like weight-tying the two heads or otherwise making them identical.

Lastly, I think I'm confused by your big formula for the unnormalized posterior log probability of () -- I think the most accessible of my confusions is that it doesn't seem to pass "basic type checking consistency".

I know the output should be a log probability, so all the added components should be logprobs/in terms of bits.

The L() term makes sense, since it's given in terms of bits.

The two parameter distances seem like they're in whatever distance metric you're using for parameter space, which seems to be very different from the logprobs.  Maybe they both just have some implicit unit conversion parameter out front, but I think it'd be surprising if it were the case that every "1 parameter unit" move through parameter space is worth "1 nat" of information.  For example, it's intuitive to me that some directions (towards zero) would be more likely than other directions.

The C() term has a lagrange multiplier, which I think are usually unitless.  In this case I think it's safe to say it's also maybe doing units conversion.  C() itself seems to possibly be in terms of bits/nats, but that isn't clear.

In normal lagrangian constrained optimization, lambda would be the parameter that gives us the resource tradeoff "how many bits of (L) loss on the data set tradeoff with a single bit (C) of inconsistency"

Finally the integral is a bit tricky for me to follow.  My admittedly-weak physics intuitions are usually that you only want to take an exponential (or definitely a log-sum-exp like this) of unitless quantities, but it looks like it has the maybe the unit of our distance in parameter space.  That makes it weird to integrate over possible parameter, which introduces another unit of parameter space, and then take the logarithm of it.

(I realize that unit-type-checking ML is pretty uncommon and might just be insane, but it's one of the ways I try to figure out what's going on in various algorithms)

Looking forward to reading more about this in the future.

"Existential risk from AI" survey results

Thanks for doing this research and sharing the results.

I'm curious if you or MIRI plan to do more of this kind of survey research in the future, or its just a one-off project.

Beijing Academy of Artificial Intelligence announces 1,75 trillion parameters model, Wu Dao 2.0

I think this take is basically correct.  Restating my version of it:

Mixture of Experts and similar approaches modulate paths through the network, such that not every parameter is used every time.  This means that parameters and FLOPs (floating point operations) are more decoupled than they are in dense networks.

To me, FLOPs remains the harder-to-fake metric, but both are valuable to track moving forward.

Beijing Academy of Artificial Intelligence announces 1,75 trillion parameters model, Wu Dao 2.0

I think the engadget article failed to capture the relevant info, so just putting my preliminary thoughts down here.  I expect my thoughts to change as more info is revealed/translated.

Loss on the dataset (for cross-entropy this is measured in bits of perplexity per token or per character) is a more important metric than parameter count, in my opinion.

However, I think parameter count does matter at least a small part because it is a signal for:
* the amount of resources that are available to the researchers (very expensive to do very large runs)
* the amount of engineering capacity that the project has access to (difficult to write code that functions well at that scale -- nontrivial to just code a working 1.7T parameter model training loop)

I expect more performance metrics at some point, on the normal set of performance benchmarks.

I also expect to be very interested in how they release/share/license the model (if at all), and who is allowed access to it.

Peekskill Lyme Incidence

I'm curious about a couple things about your case if you're willing to share.

1. does this mean you still carry the disease?
2. did the diagnosis involve western blot / checking for antibodies? (vs just observations/location/history/etc)
3. what is your current level of concern about long term symptoms from lyme given this or future exposures?

Alex Ray's Shortform

My feeling is that I don't have a strong difference between them.  In general simpler policies are both easier to execute in the moment and also easier for others to simulate.

The clearest version of this is to, when faced with a decision, decide on an existing principle to apply before acting, or else define a new principle and act on this.

Principles are examples of short policies, which are largely path-independent, which are non-narrative, which are easy to execute, and are straightforward to communicate and be simulated by others.

Alex Ray's Shortform

(Note: this might be difficult to follow.  Discussing different ways that different people relate to themselves across time is tricky.  Feel free to ask for clarifications.)


I'm reading the paper Against Narrativity, which is a piece of analytic philosophy that examines Narrativity in a few forms:

  • Psychological Narrativity - the idea that "people see or live or experience their lives as a narrative or story of some sort, or at least as a collection of stories."
  • Ethical Narrativity - the normative thesis that "experiencing or conceiving one's life as a narrative is a good thing; a richly [psychologically] Narrative outlook is essential to a well-lived life, to true or full personhood."

It also names two kinds of self-experience that it takes to be diametrically opposite:

  • Diachronic - considers the self as something that was there in the further past, and will be there in the further future
  • Episodic - does not consider the self as something that was there in the further past and something that will be there in the further future

Wow, these seem pretty confusing.  It sounds a lot like they just disagree on the definition of the world "self".  I think there is more to it than that, some weak evidence being discussing this concept of length with a friend (diachronic) who had a very different take on narrativity than myself (episodic).

I'll try to sketch what I think "self" means.  It seems that for almost all nontrivial cognition, it seems like intelligent agents have separate concepts (or the concept of a separation between) the "agent" and the "environment".  In Vervaeke's works this is called the Agent-Arena Relationship.

You might say "my body is my self and the rest is the environment," but is that really how you think of the distinction?  Do you not see the clothes you're currently wearing as part of your "agent"?  Tools come to mind as similar extensions of our self.  If I'm raking leaves for a long time, I start to sense myself as a the agent being the whole "person + rake" system, rather than a person whose environment includes a rake that is being held.

(In general I think there's something interesting here in proto-human history about how tool use interacts with our concept of self, and our ability to quickly adapt to thinking of a tool as part of our 'self' as a critical proto-cognitive-skill.)

Getting back to Diachronic/Episodic:  I think one of the things that's going on in this divide is that this felt sense of "self" extends forwards and backwards in time differently.


I often feel very uncertain in my understanding or prediction of the moral and ethical natures of my decisions and actions.  This probably needs a whole lot more writing on its own, but I'll sum it up as two ideas having a disproportionate affect on me:

  • The veil of ignorance, which is a thought experiment which leads people to favor policies that support populations more broadly (skipping a lot of detail and my thoughts on it for now).
  • The categorical imperative, which I'll reduce here as the principle of universalizability -- a policy for actions given context is moral if it is one you would endorse universalizing (this is huge and complex, and there's a lot of finicky details in how context is defined, etc.  skipping that for now)

Both of these prompt me to take the perspective of someone else, potentially everyone else, in reasoning through my decisions.  I think the way I relate to them is very Non-Narrative/Episodic in nature.

(Separately, as I think more about the development of early cognition, the more the ability to take the perspective of someone else seems like a magical superpower)

I think they are not fundamentally or necessarily Non-Narrative/Episodic -- I can imagine both of them being considered by someone who is Strongly Narrative and even them imagining a world consisting of a mixture of Diachronic/Episodic/etc.


Priors are hard.  Relatedly, choosing between similar explanations of the same evidence is hard.

I really like the concept of the Solomonoff prior, even if the math of it doesn't apply directly here.  Instead I'll takeaway just this piece of it:

"Prefer explanations/policies that are simpler-to-execute programs"

A program may be simpler if it has fewer inputs, or fewer outputs.  It might be simpler if it requires less memory or less processing.

This works well for choosing policies that are easier to implement or execute, especially as a person with bounded memory/processing/etc.


A simplifying assumption that works very well for dynamic systems is the Markov property.

This property states that all of the information in the system is present in the current state of the system.

One way to look at this is in imagining a bunch of atoms in a moment of time -- all of the information in the system is contained in the current positions and velocities of the atoms.  (We can ignore or forget all of the trajectories that individual atoms took to get to their current locations)

In practice we usually do this to systems where this isn't literally true, but close-enough-for-practical-purposes, and combine it with stuffing some extra stuff into the context for what "present" means.

(For example we might define the "present" state of a natural system includes "the past two days of observations" -- this still has the Markov property, because this information is finite and fixed as the system proceeds dynamically into the future)


I think that these pieces, when assembled, steer me towards becoming Episodic.

When choosing between policies that have the same actions, I prefer the policies that are simpler. (This feels related to the process of distilling principles.)

When considering good policies, I think I consider strongly those policies that I would endorse many people enact.  This is aided by these policies being simpler to imagine.

Policies that are not path-dependent (for example, take into account fewer things in a person's past) are simpler, and therefore easier to imagine.

Path-independent policies are more Episodic, in that they don't rely heavily on a person's place in their current Narratives.


I don't know what to do with all of this.

I think one thing that's going on is self-fulfilling -- where I don't strongly experience psychological Narratives, and therefore it's more complex for me to simulate people who do experience this, which via the above mechanism leads to me choosing Episodic policies.

I don't strongly want to recruit everyone to this method of reasoning.  It is an admitted irony of this system (that I don't wish for everyone to use the same mechanism of reasoning as me) -- maybe just let it signal just how uncertain I feel about my whole ability to come to philosophical conclusions on my own.

I expect to write more about this stuff in the near future, including experiments I've been doing in my writing to try to move my experience in the Diachronic direction.  I'd be happy to hear comments for what folks are interested in.


Load More