jessicata

Jessica Taylor. CS undergrad and Master's at Stanford; former research fellow at MIRI.

I work on decision theory, social epistemology, strategy, naturalized agency, mathematical foundations, decentralized networking systems and applications, theory of mind, and functional programming languages.

Wiki Contributions

Comments

"Infohazard" is a predominantly conflict-theoretic concept

Your OP doesn't provide nearly enough information for anyone to make accurate judgments or give useful advice. You could have added more detail that concealed the exact secret while revealing circumstantial information about it, e.g. perhaps it is a technological design that could bring X benefit but you expect corporate competition over enhancing the technology to cause Y harm. Or you could have offered a fictional moral dilemma similar to your own to ask people to make judgments about it. As it stands it doesn't seem possible to offer better advice than "think about it more, perhaps discuss with some people you trust, don't publish/market it in the meantime".

"Infohazard" is a predominantly conflict-theoretic concept

Bostrom's original paper defines "infohazard" so as to be inclusive of what you term "secrets". I define "self-infohazard" to describe a specific case of an individual being harmed by themselves knowing something. Perhaps you would like to propose a different taxonomy that disagrees with Bostrom's and/or my definitions?

EDIT: At MIRI, Nate Soares frequently used the term "infohazard" to refer to potentially dangerous technological secrets, in line with Bostrom's usage. I have no reason to believe that anyone at the organization would personally be harmed on net by knowing such technological secrets. I'm not saying you have to agree with Nate here but I got the impression from this usage that this isn't a case of Bostrom being idiosyncratic.

"Infohazard" is a predominantly conflict-theoretic concept

Yes, different priors (especially ones making hard-to-test predictions) largely have the same effects as different utility functions (e.g. this post discusses simulating belief changes with utility changes). The coffee shop example includes limited communication abilities, perhaps there are other better examples?

Visible Thoughts Project and Bounty Announcement

How do you think this project relates to Ought? Seems like the projects share a basic objective (having AI predict human thoughts had in the course of solving a task). Ought has more detailed proposals for how the thoughts are being used to solve the task (in terms of e.g. factoring a problem into smaller problems, so that the internal thoughts are a load-bearing part of the computation rather than an annotation that is predicted but not checked for being relevant).

So we are taking one of the outputs that current AIs seem to have learned best to design, and taking one of the places where human thoughts about how to design it seem most accessible, and trying to produce a dataset which the current or next generation of text predictors might be able to use to learn how to predict thoughts about designing their outputs and not just predict the outputs themselves.

As the proposal stands it seems like the AI's predictions of human thoughts would offer no relevant information about how the AI is predicting the non-thought story content, since the AI could be predicting these different pieces of content through unrelated mechanisms.

Christiano, Cotra, and Yudkowsky on AI progress

This section seemed like an instance of you and Eliezer talking past each other in a way that wasn't locating a mathematical model containing the features you both believed were important (e.g. things could go "whoosh" while still being continuous):

[Christiano][13:46]

Even if we just assume that your AI needs to go off in the corner and not interact with humans, there’s still a question of why the self-contained AI civilization is making ~0 progress and then all of a sudden very rapid progress

[Yudkowsky][13:46]

unfortunately a lot of what you are saying, from my perspective, has the flavor of, “but can’t you tell me about your predictions earlier on of the impact on global warming at the Homo erectus level”

you have stories about why this is like totally not a fair comparison

I do not share these stories

[Christiano][13:46]

I don’t understand either your objection nor the reductio

like, here’s how I think it works: AI systems improve gradually, including on metrics like “How long does it take them to do task X?” or “How high-quality is their output on task X?”

[Yudkowsky][13:47]

I feel like the thing we know is something like, there is a sufficiently high level where things go whooosh humans-from-hominids style

[Christiano][13:47]

We can measure the performance of AI on tasks like “Make further AI progress, without human input”

Any way I can slice the analogy, it looks like AI will get continuously better at that task

Christiano, Cotra, and Yudkowsky on AI progress

A bunch of this was frustrating to read because it seemed like Paul was yelling "we should model continuous changes!" and Eliezer was yelling "we should model discrete events!" and these were treated as counter-arguments to each other.

It seems obvious from having read about dynamical systems that continuous models still have discrete phase changes. E.g. consider boiling water. As you put in energy the temperature increases until it gets to the boiling point, at which point more energy put in doesn't increase the temperature further (for a while), it converts more of the water to steam; after all the water is converted to steam, more energy put in increases the temperature further.

So there are discrete transitions from (a) energy put in increases water temperature to (b) energy put in converts water to steam to (c) energy put in increases steam temperature.

In the case of AI improving AI vs. humans improving AI, a simple model to make would be one where AI quality is modeled as a variable, , with the following dynamical equation:

where is the speed at which humans improve AI and is a recursive self-improvement efficiency factor. The curve transitions from a line at early times (where ) to an exponential at later times (where ). It could be approximated as a piecewise function with a linear part followed by an exponential part, which is a more-discrete approximation than the original function, which has a continuous transition between linear and exponential.

This is nowhere near an adequate model of AI progress, but it's the sort of model that would be created in the course of a mathematically competent discourse on this subject on the way to creating an adequate model.

Dynamical systems contains many beautiful and useful concepts like basins of attraction which make sense of discrete and continuous phenomena simultaneously (i.e. there are a discrete number of basins of attraction which points fall into based on their continuous properties).

I've found Strogatz's book, Nonlinear Dynamics and Chaos, helpful for explaining the basics of dynamical systems.

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

At least some people were able to understand though. This lead to a sort of social division where some people were much more willing/able to talk about certain social phenomena than other people were.

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

He argued

(a) EA orgs aren't doing what they say they're doing (e.g. cost effectiveness estimates are wildly biased, reflecting bad procedures being used internally), and it's hard to get organizations to do what they say they do

(b) Utilitarianism isn't a form of ethics, it's still necessary to have principles, as in deontology or two-level consequentialism

(c) Given how hard it is to predict the effects of your actions on far-away parts of the world (e.g. international charity requiring multiple intermediaries working in a domain that isn't well-understood), focusing on helping people you have more information about makes sense unless this problem can be solved

(d) It usually makes more sense to focus on ways of helping others that also build capacities, including gathering more information, to increase long-term positive impact

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

I appreciate that you're thinking about my well-being! While I found it stressful to post this and then read and respond to so many comments, I didn't have much else going on at the time so I did manage to rest a lot. I definitely feel better after having gotten this off my chest.

[Book Review] "The Bell Curve" by Charles Murray

The implied game is:

Step 1: The left decides what is offensively right-wing

Step 2: LW people decide what to say given this

Steven is proposing a policy for step 2 that doesn't do anything that the left has decided is offensively right-wing. This gives the left the ability to prevent arbitrary speech.

If the left is offended by negotiating for more than $1 in the ultimatum game, Steven's proposed policy would avoid doing that, thereby yielding. (The money here is metaphorical, representing benefits LW people could get by talking about things without being attacked by the left)

Load More