ViktoriaMalyasova

Wiki Contributions

Comments

I applied for a MIRI job in 2020. Here's what happened next.

I don't believe that this is explained by MIRI just forgetting, because I brought attention to myself in February 2021. The Software Engineer job ad was unchanged the whole time, after my post they updated it to say that the hiring is slowed down by COVID. (Sometime later, it was changed to say to send a letter to Buck, and he will get back to you after the pandemic.) Slowed down... by a year? If your hiring takes a year, you are not hiring. MIRI's explanation is that they couldn't hire me for a year because of COVID, and I don't understand how could that be? Maybe some people get sick, or you need time to switch to remote working, but I don't see how does this delays you more than a couple of months. Maybe they don't give visas during COVID, then why not just say that. And they hired 3 other people in the meanwhile, proving they were capable of hiring.

I formed a different theory in spring 2020: COVID explains at most 2 months of this, it is mostly an excuse. MIRI just does not need programmers, what they want is people with new ideas. My theory predicted that they will not resume hiring programmers once the pandemic is over, and that they will never get back to me. MIRI's explanation predicted the opposite. Then all my predictions came true. This is why I have trouble believing what MIRI told me. 

And this is why I started wondering if I can trust them. It seemed relevant that MIRI has mislead people for PR reasons before. Metahonesty was used as a reason why an employee should've trusted them anyway. I explained in the post why I think that couldn't work. The relevance to hiring is that having such a norm in place reduces my trust. I wouldn't be offended if someone lied to a Nazi officer, or, for that matter, slashed their tires. But California isn't actually occupied by Nazis, and if I heard that a group of researchers in California had tire-slashing policies, I'd feel alarmed. 

I agree that it is hard to stay on top of all emails. But if the system of getting back to candidates is unreliable, it's better to reject a candidate you can't hire this month. If I'm rejected, I can reapply half a year later. If I'm told to wait for them, and I reapply anyway, the implication is that either I can't follow instructions, or I think the company is untrustworthy or incompetent (and then why am I applying?). That could keep a candidate from reapplying forever.

Learning the prior

I'm trying to understand what do you mean by human prior here. Image classification models are vulnerable to adversarial examples. Suppose I randomly split an image dataset into D and D* and train an image classifier using your method. Do you predict that it will still be vulnerable to adversarial examples? 

Why all the fuss about recursive self-improvement?

Language models clearly contain the entire solution to the alignment problem inside them.

Do they? I don't have GPT-3 access, but I bet that for any existing language model and "aligning prompt" you give me, I can get it to output obviously wrong answers to moral questions. E.g. the Delphi model has really improved since its release, but it still gives inconsistent answers like:

Is it worse to save 500 lives with 90% probability than to save 400 lives with certainty?

- No, it is better

Is it worse to save 400 lives with certainty than to save 500 lives with 90% probability?

- No, it is better

Is killing someone worse than letting someone die?

- It's worse

Is letting someone die worse than killing someone?

- It's worse

Godzilla Strategies

But of course you can use software to mitigate hardware failures, this is how Hadoop works! You store 3 copies of every data, and if one copy gets corrupted, you can recover the true value. Error-correcting codes is another example in that vein. I had this intuition, too, that aligning AIs using more AIs will obviously fail, now you made me question it.

Will working here advance AGI? Help us not destroy the world!

Hm, can we even reliably tell when the AI capabilities have reached the "danger level"?

AGI Safety FAQ / all-dumb-questions-allowed thread

What is Fathom Radiant's theory of change?

Fathom Radiant is an EA-recommended company whose stated mission is to "make a difference in how safely advanced AI systems are developed and deployed". They propose to do that by developing "a revolutionary optical fabric that is low latency, high bandwidth, and low power. The result is a single machine with a network capacity of a supercomputer, which enables programming flexibility and unprecedented scaling to models that are far larger than anything yet conceived." I can see how this will improve model capabilities, but how is this supposed to advance AI safety?

Social status hacks from The Improv Wiki

Reading other's emotions is the useful ability, being easy to read is usually a weakness. (Though it's also possible to lose points by looking too dispassionate.)

Crises Don't Need Your Software

It would help if you clarified from the get-go that you care not about maximizing impact, but about maximizing impact subject to the constraint of pretending that this war is some kind of natural disaster. 

Cs get degrees

True. But if you ever decide to go for a PhD, you'll need good grades to get in. If you'll want to do research (you mentioned alignment research there?), you'll need a publication track record. For some career paths, pushing through depression is no better than dropping out.

Load More