Recent Discussion

In his AI Safety “Success Stories” post, Wei Dai writes:

[This] comparison table makes Research Assistant seem a particularly attractive scenario to aim for, as a stepping stone to a more definitive success story. Is this conclusion actually justified?

I share Wei Dai's intuition that the Research Assistant path is neglected, and I want to better understand the safety problems involved in this path.

Specifically, I'm envisioning AI research assistants, built without any kind of reinforcement learning, that help AI alignment researchers identify, understand, and solve AI alignment problems. S

... (Read more)
2John_Maxwell10h Do you have any thoughts on how specifically those failure modes might come about?

Those specific failure modes seem to me like potential convergent instrumental goals of arbitrarily capable systems that "want to affect the world" and are in an air-gapped computer.

I'm not sure whether you're asking about my thoughts on:

  1. how can '(un)supervised learning at arbitrarily large scale' produce such systems; or

  2. conditioned on such systems existing, why might they have convergent instrumental goals that look like those failure modes.

But in Newcomb's problem, the agent's reward in case of wrong prediction is already defined. For example, if the agent one-boxes but the predictor predicted two-boxing, the reward should be zero. If you change that to +infinity, aren't you open to the charge of formalizing the wrong problem?

Pieces of time
342d2 min readShow Highlight

My friend used to have two ‘days’ each day, with a nap between—in the afternoon, he would get up and plan his day with optimism, whatever happened a few hours before washed away. Another friend recently suggested to me thinking of the whole of your life as one long day, with death on the agenda very late this evening. I used to worry, when I was very young, that if I didn’t sleep, I would get stuck in yesterday forever, while everyone else moved on to the new day. Right now, indeed some people have moved on to Monday, but I’m still winding down Sunday because I had a bad headache and couldn’t ... (Read more)

Oh yeah, I think I get something similar when my sleep schedule gets very out of whack, or for some reason when I moved into my new house in January, though it went back to normal with time. (Potentially relevant features there: bedroom didn't seem very separated from common areas, at first was sleeping on a pile of yoga mats instead of a bed, didn't get out much.)

2ESRogs12h What about traveling in the Midwest gives you this feeling? Is it the travel? Is it the Midwest itself? Is it that you're in a non-urban part of the Midwest, but you're used to the hustle and bustle of a city?
[Event]SSC Madison: Neuroscience
1Nov 16thMadisonShow Highlight

Vegan food and discussion of SSC posts on Neuroscience

If you log into your credit card account you'll see a list of charges, each with a date, amount, and merchant. It would be helpful if this also included receipt data:

  • If you didn't recognize a charge, seeing what it was for could remind you.

  • If you needed a receipt for taxes or reimbursement one could be captured automatically.

  • Personal finance tools (or corporate equivalents for company cards) could track spending with higher granularity.

  • Because the credit card company knows what the items are they can better detect fraud.

Receipt data isn't currently part of the protoc... (Read more)

I work in the area. In the EMV specification the receipt content is already saved electronically and is somewhat standardized, see, for example https://www.mastercard.us/content/dam/mccom/global/documents/transaction-processing-rules.pdf. What is missing is for the consumers and point-of-sale owners to be able to access it easily. The receipt does not identify the product sold, by the way, but enough details to verify the transaction's occurrence.

Of course, if your chipped card is stolen and pinless tap is supported for small purchases, no transactio... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

2jimrandomh17h Right, they would certainly do it if you paid them enough (and lowering the fee is a form of payment); this is a reason why the price would be higher.
4Dagon18h This used to be common, called "country club billing". most credit cards stopped it in the 70s, American Express continued it through part of the 90s. It's expensive for merchants and card processors, not valued by most customers, and as far as I know nobody is seriously considering bringing it back. The various contradictory incentives about data privacy and who knows what when are all trivial compared to the amount of work it'd take, for no significant value to customers. The number of humans who bother to keep and categorize receipts is TINY, and it's probably correlated with not spending very much on credit-card fees. Attracting these customers may well be negative-value, but even if it's positive, it's not worth much effort.
2jkaufman17h It looks to me like country club billing stopped because at a time when everything was done on paper it was far too much work. If the purchase information was sent as part of getting the transaction approved then you can use it for fraud prevention in a way that wasn't possible in the 1970s.

I am pondering a hypothetical scenario that I think is fascinating but quite unrealistic and involves knowledge across a wide variety of fields, of which IMO physics gets the better part.

I'm considering some sites that I know. Reddit has a sub called r/AskScienceDiscussion but this sub is not very warm to this type of query. Quora has degraded so much and doesn't even have the option to expand the subject over a length of mere 150 chars or so, which is utterly ridiculous. I'm not sure about Stackexchange - should I post in their physics site? LessWrong boasts that people can ask... (Read more)

Oh, that very last sentence is something I didn't think about. I also discovered worldbuilding very recently, looks promising too. Thanks!

Full title: Is the Orthogonality Thesis Defensible if We Assume Both Valence Realism and Open Individualism?

https://qualiacomputing.com/2019/11/09/is-the-orthogonality-thesis-defensible-if-we-assume-both-valence-realism-and-open-individualism/ (a)

An excerpt:


The cleanest typology for metaphysics I can offer is: some theories focus on computations as the thing that’s ‘real’, the thing that ethically matters – we should pay attention to what the *bits* are doing. Others focus on physical states – we should pay attention to what the *atoms* are doing. I
... (Read more)
Levers error
205d2 min readShow Highlight

Anna writes about bucket errors. To gloss the idea: sometimes two facts are mentally tracked by only one variable; in that case, correctly updating the belief about one fact can also incorrectly update the belief about the other fact, so it is sometimes epistemic to flinch away from the truth of the first fact (until you can create more variables to track the facts separately).

I think there's a sort of conjugate error: two actions are bound together in one "lever". An action is a class of motor outputs, and a lever is a thing actually available to the mind to decide to do or not.

For example, I

... (Read more)

This seems quite right to me, that in our minds things are often confused and conflated that don't need to be and as a result we act in ways that aren't what we think should be possible and it feels like doing what we really want is impossible because in our minds we don't know how to separate the thing we want from the thing we don't want. One possible way to deal with these sorts of problems that I've been excited about lately as a good framing for the mechanism that underlies the processes that clear these sorts of confusions is... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

Indescribable
133d1 min readShow Highlight

Some things can be described only via experience.

  • Direct sensory experience (such as the color red)
  • Foreign untranslatable words and phrases
  • Rasas
  • Certain meditative states (such as kenshō and satori)

Other things cannot be precisely described at all.

  • Any particular noncomputable number

Indescribable things cannot be described in a finite number of words. That's because each one contains an infinite quantity of information. I don't mean they convey this information all at once (except for noncomputable numbers). Rather, they open up a new channel of information.

Opening up a new channel of

... (Read more)

I guess this is a matter of opinion on how much explanation makes something "untranslatable". For example, maybe it takes 1000 words to give enough context to adequately convey the meaning of a word with a very precise meaning in another language. Is this word "translatable"? In a certain sense no, because making sense of it required giving the person a lot of new context that they didn't have before such that they could make sense of it that was beyond simple reference to existing concepts they had. Obviously the other end of the ... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

The Technique Taboo
3614d1 min readShow Highlight

For a strange few decades that may just be starting to end, if you went to art school you'd be ostracised by your teachers for trying to draw good representational art. "Representational art" means pictures that look like real things. Art school actively discouraged students from getting better at drawing.

"Getting better at drawing" is off-topic at my weekly local drawing club too. I've literally never heard it discussed.

This taboo extends far beyond art. My nearest gym forbids weightlifters from using electronic systems to log their progress. I'm friends with programmers who can't touch type.

... (Read more)

You are mixing up two topics. Separating them does not provide any immediate clarity, but it's important to separate them. One topic is keeping records, observing progress, and trying to do better. The weightlifters record objective performance. Math students try to see if they can do an exercise. Practice helps, but the feedback of performance doesn't say how to improve. The other topic, evoked in my mind by the word "technique," is breaking down a big skill into small skills. Biology students learn the specific technique of how to use... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

Rohin Shah on reasons for AI optimismΩ
4013d1 min readΩ 9Show Highlight
Rohin Shah

I along with several AI Impacts researchers recently talked to talked to Rohin Shah about why he is relatively optimistic about AI systems being developed safely. Rohin Shah is a 5th year PhD student at the Center for Human-Compatible AI (CHAI) at Berkeley, and a prominent member of the Effective Altruism community.

Rohin reported an unusually large (90%) chance that AI systems will be safe without additional intervention. His optimism was largely based on his belief that AI development will be relatively gradual and AI researchers will correct safety issues that come up.

He reported ... (Read more)

4ricraz10h I predict that Rohin would say something like "the phrase 'approximately optimal for some objective/utility function' is basically meaningless in this context, because for any behaviour, there's some function which it's maximising". You might then limit yourself to the set of functions that defines tasks that are interesting or relevant to humans. But then that includes a whole bunch of functions which define safe bounded behaviour as well as a whole bunch which define unsafe unbounded behaviour, and we're back to being very uncertain about which case we'll end up in.

That would probably be part of my response, but I think I'm also considering a different argument.

The thing that I was arguing against was "(c): agents that we build are optimizing some objective function". This is importantly different from "mesa-optimisers [would] end up being approximately optimal for some objective/utility function" when you consider distributional shift.

It seems plausible that the agent could look like it is "trying to achieve" some simple utility function, and perhaps it would even be approximately ... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

  • If it’s worth saying, but not worth its own post, here's a place to put it.
  • And, if you are new to LessWrong, here's the place to introduce yourself.
    • Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ.

The Open Thread sequence is here.

I'm not sure I'm familiar with the word "mixture" in the way you're using it.

I.

Aeon: Post-Empirical Science Is An Oxymoron And It is Dangerous:

There is no agreed criterion to distinguish science from pseudoscience, or just plain ordinary bullshit, opening the door to all manner of metaphysics masquerading as science. This is ‘post-empirical’ science, where truth no longer matters, and it is potentially very dangerous.

It’s not difficult to find recent examples. On 8 June 2019, the front cover of New Scientist magazine boldly declared that we’re ‘Inside the Mirrorverse’. Its editors bid us ‘Welcome to the parallel reality that’s hiding in plain sight’. […]

[Some physicis

... (Read more)

Well, I am a "semi-instrumentalist": I don't think it is meaningful to ask what reality "really is" except for the projection of the reality on the "normative ontology".

The Credit Assignment ProblemΩ
505d9 min readΩ 19Show Highlight

This post is eventually about partial agency. However, it's been a somewhat tricky point for me to convey; I take the long route. Epistemic status: slightly crazy.


I've occasionally said that everything boils down to credit assignment problems.

One big area which is "basically credit assignment" is mechanism design. Mechanism design is largely about splitting gains from trade in a way which rewards cooperative behavior and punishes uncooperative behavior. Many problems are partly about mechanism design:

  • Building functional organizations;
  • Designing markets to solve problems (suc
... (Read more)

(I don't speak for Abram but I wanted to explain my own opinion.) Decision theory asks, given certain beliefs an agent has, what is the rational action for em to take. But, what are these "beliefs"? Different frameworks have different answers for that. For example, in CDT a belief is a causal diagram. In EDT a belief is a joint distribution over actions and outcomes. In UDT a belief might be something like a Turing machine (inside the execution of which the agent is supposed to look for copies of emself). Learning theory allows us to gain insight through t

... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

Contrast these two expressions (hideously mashing C++ and pseudo-code):

  1. ,
  2. .

The first expression just selects the action that maximises for some function , intended to be seen as a reward function.

The second expression borrows from the syntax of C++; means the memory address of , while means the object at the memory address of . How is that different from itself? Well, it's meant to emphasise the ease of the agent wireheading in that scenario: all it has to do is overwrite whatever is written at memory location . Then

... (Read more)

Another aspect / method is, let's call it, value hysteresis. If there are two functions which ambiguously both agree with the reward, then it's possible that the agent will come across one interpretation first, and then (motivated by that first goal) resist adopting the other interpretation. Like how drugs and family both give us dopamine, but if we start by caring about our family, we may shutter at the thought of abandoning the family life for drugs, and vice versa! So maybe we need to have the target interpretation salient and simple to learn early in t

... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post
What I’ll be doing at MIRIΩ
7013h1 min readΩ 26Show Highlight

Note: This is a personal post describing my own plans, not a post with actual research content.

Having finished my internship working with Paul Christiano and others at OpenAI, I’ll be moving to doing research at MIRI. I’ve decided to do research at MIRI because I believe MIRI will be the easiest, most convenient place for me to continue doing research in the near future. That being said, there are a couple of particular aspects of what I’ll be doing at MIRI that I think are worth being explicit about.

First, and most importantly, this decision does not represent any substantive change in my bel

... (Read more)

That you're working full time on research, have a stable salary, and are in a geographical location conducive to talking with a lot of other thoughtful people who think a lot about these topics, are all very valuable things, and I'm pleased to hear these things are happening for you :-)

On the subject of privacy, I was recently reading a friend's career plan, who was looking for jobs in AI alignment, and I wrote this:

Do not accept secrets lightly. If you accept one wrong secret, you will go the way of MIRI or Leverage or US government officials with a secur

... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

The standard formulation of Newcomb's problem has always bothered me, because it seemed like a weird hypothetical designed to make people give the wrong answer. When I first saw it, my immediate response was that I would two-box, because really, I just don't believe in this "perfect predictor" Omega. And while it may be true that Newcomblike problems are the norm, most real situations are not so clear cut. It can be quite hard to demonstrate why causal decision theory is inadequate, let alone build up an intuition about it. In fact, the closest I've seen to a real-worl... (Read more)

In fact, current lie detector technology isn't that good - it relies on a repetitive and careful mix of calibration and test questions, and even then isn't reliable enough for most real-world uses. The original ambiguity remains that the problem is underspecified: why do I believe that it's accuracy for other people (probably mostly psych students) applies to my actions?


1ErickBall16h I don't see how two-boxing is a Nash equilibrium. Are you saying you should two-box in a transparent Newcomb's problem if Omega has predicted you will two-box? Isn't this pretty much analogous to counterfactual mugging [https://wiki.lesswrong.com/wiki/Counterfactual_mugging], where UDT says we should one-box?
3cousin_it15h Sorry, I wrote some nonsense in another comment and then deleted it. I guess the point is that UDT (which I agree with) recommends non-equilibrium behavior in this case.

This is a new paper relating experimental results in deep learning to human psychology and cognitive science. I'm excited to get feedback and comments. I've included some excerpts below.


Abstract

This paper is about the cognitive science of visual art. Artists create physical artifacts (such as sculptures or paintings) which depict people, objects, and events. These depictions are usually stylized rather than photo-realistic. How is it that humans are able to understand and create stylized representations? Does this ability depend on general cognitive capacities or an evolutionary adap... (Read more)

The GAN's goal is to match these photos, not to match 3D scenes (which it doesn't know anything about).

I've see some results here where I thought the consensus interpretation was "angle as latent feature", such that there was an implied 3D scene in the latent space. (Most of what I'm seeing now with a brief scan has to do with facial rotations and pose invariance.) Maybe I should put scene is scare quotes, because it's generally not fully generic, as the sorts of faces and rooms you find in such a database are highly nonrandom / have a bunch of basic st

... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

(Follow-up to Randomness vs Ignorance and Reference Classes for Randomness)

I've argued that all uncertainty can be divided into randomness and ignorance and that this model is free of contradictions. Its purpose is to resolve anthropic puzzles such as the Sleeping Beauty problem.

If the model is applied to these problems, they appear to be underspecified. Details required to categorize the relevant uncertainty are missing, and this underspecification might explain why there is still no consensus on the correct answers. However, if the missing pieces are added in such a way that all uncerta... (Read more)

1sil ver16h This implies that everyone arguing about the correct probability in Sleeping Beauty is misguided, right? I definitely think it is essential to differentiate between the two. I think there are cases where the question is the same and meaningful but the answer changes as the nature of uncertainty changes. Presumptuous Philosopher is such a case. I argue more that the results of this model are meaningful in the next post.

Yes, everyone arguing that there is a correct probability without definition of what that probability is predicting is misguided.

When you look at a paper, what signs cause you to take it seriously? What signs cause you to discard the study as too poorly designed to be much evidence one way or the other?

I'm hoping to compile a repository of heuristics on study evaluation, and would love to hear people's tips and tricks, or their full evaluation-process.

I'm looking for things like...

  • "If the n (sample size) is below [some threshold value], I usually don't pay much attention."
  • "I'm mostly on the lookout for big effect sizes."
  • "I read the abstract, then I spend a few minutes th
... (Read more)

Context: My experience is primarily with psychology papers (heuristics & biases, social psych, and similar areas), and it seems to generalize pretty well to other social science research and fields with similar sorts of methods.


One way to think about this is to break it into three main questions:

1. Is this "result" just noise? Or would it replicate?

2. (If there's something besides noise) Is there anything interesting going on here? Or are all the "effects" just confounds, statistical artifacts, demonstrating the obvious, etc.

3.... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

6Answer by Elizabeth16h If a psychology study doesn't prominently say who its subjects were, the answer is "undergrads at the university, predominantly those is psychology classes" and it is worthless.
5habryka15h I mean, lots of phenomena are likely to still be present in undergraduate psychology students, so it seems weird to say that the results are going to be worthless. Seems to me like it depends on the domain on how much you expect results to generalize from that population to others.
Load More