I was recently surprised to notice that Anthropic doesn't seem to have a commitment to publish its safety research.[1] It has a huge safety team but has only published ~5 papers this year (plus an evals report). So probably it has lots of safety research it's not publishing. E.g. my impression is that it's not publishing its scalable oversight and especially adversarial robustness and inference-time misuse prevention research.
Not-publishing-safety-research is consistent with Anthropic prioritizing the win the race goal over the help all labs improve s...
I disagree. I think the standard of "Am I contributing anything of substance to the conversation, such as a new argument or new information that someone can engage with?" is a pretty good standard for most/all comments to hold themselves to, regardless of the amount of engagement that is expected.
Man, I have such contradictory feelings about tuning cognitive strategies.
Just now I was out on a walk, and I had to go up a steep hill. And I thought "man I wish I could take this as a downhill instead of an uphill. If this were a loop I could just go the opposite way around. But alas I'm doing an out-and-back, so I have to take this uphill".
And then I felt some confusion about why the loop-reversal trick doesn't work for out-and-back routes, and a spark of curiosity, so I thought about that for a bit.
And after I had cleared up my confusion, I was a happy...
Do counterfactual mugging scenarios violate the 2nd Kolmgorov Axiom, or Normalization?
I'm thinking about, eg, this situation:
Imagine a superintelligence (like a version of Pascal’s demon) approaches you and tells you it has flipped a coin. If the coin landed heads, the superintelligence would never interact with you. If it landed tails, it will ask you to give $100. If you agree to give $100 in the tails scenario, the superintelligence will retroactively reward you with $10,000 if the coin landed heads.
There are 4 possible outcomes, right? {Heads+Would Pay, Heads+Would Not Pay, Tails+Would Pay, Tails+Would Not Pay} To obey normalization, P(H+WP, H+WNP, T+WP, T+WNT) = 1, right?
No..? I called it "Pascal's Demon"? The hypothetical being is called "Pascal's Demon".
The reverse is also possible: an n-ary relation can be represented as an n-ary function which maps instances of the relation to the object "true" and non-instances to the object "false".
So which is better?
...Though Frege was interested primarily in reducing mathematics to logic, he succeeded in reducing an important part of logic to mathematics by defining relations in terms of functions. By contrast, Whitehead & Russell reduced an important
The current state of the art for salary negotiations is really bad. It rewards disagreeableness, stubornness and social skills, and is just so unelegant.
Here's a better way of doing salary negotiation:
Procedure via a two-sided sealed-bid auction, splitting the difference in bids[1]:
(I'm confident this isn't incentive-compatible. Consider the case where you happen to exactly know the other person's bid as an example. I do think it is a good baseline mechanism though. Because it isn't incentive compatible, both parties need to precommit to no further negotiation if .)
Made a short math video. Target audience maybe kids in the fifth grade of elementary school who are interested in math. Low production quality... I am just learning how to do these things. English subtitles; the value of the video is mostly in the pictures.
The goal of the video is to make the viewer curious about something, without telling them the answer. Kids in the fifth grade should probably already know the relevant concept, but they still need to connect it to the problem in the video.
The relevant concept is: prime numbers.
Has anyone here investigated before if washing vegetables/fruits is worth it? Until recently I never washed my vegetables, because I classified that as a bullshit value claim.
Intuitively, if I am otherwise also not super hygienic (like washing my hands before eating) it doesn't seem that plausible to me that vegetables are where I am going to get infected from other people having touched the carrots etc... . Being in quarantine during a pandemic might be an exception, but then again I don't know if I am going to get rid of viruses if I am just lazily rinsi...
good argumend for organic food if You're on the: I don't wash my fruits side.
Billionaires read LessWrong. I have personally had two reach out to me after a viral blog post I made back in December of last year.
The way this works is almost always that someone the billionaire knows will send them an interesting post and they will read it.
Several of the people I've mentioned this to seemed surprised by it, so I thought it might be valuable information for others.
That could be great especially for people who are underconfident and/or procrastinators.
For example, I don't think anyone would want to send any money to me, because my blogging frequency is like one article per year, and the articles are perhaps occasionally interesting, but nothing world-changing. I'm like 99% sure about this. But just in the hypothetical case that I am wrong... or maybe if in future my frequency and quality of blogging will increase but I will forget to set up a way to sponsor me... if I find out too late that I was leaving money on the...
Using air purifiers in two Helsinki daycare centers reduced kids' sick days by about 30%, according to preliminary findings from the E3 Pandemic Response study. The research, led by Enni Sanmark from HUS Helsinki University Hospital, aims to see if air purification can also cut down on stomach ailments. https://yle.fi/a/74-20062381
See also tag Air Quality
Linkpost: "Against dynamic consistency: Why not time-slice rationality?"
This got too long for a "quick take," but also isn't polished enough for a top-level post. So onto my blog it goes.
I’ve been skeptical for a while of updateless decision theory, diachronic Dutch books, and dynamic consistency as a rational requirement. I think Hedden's (2015) notion of time-slice rationality nicely grounds the cluster of intuitions behind this skepticism.
I'm afraid I don't understand your point — could you please rephrase?
Here are a few observations I have made when it comes to going to bed on time.
I set up an alarm that reminds me when my target bedtime has arrived. Many times when I am lost in an activity, the alarm makes me remember that I made the commitment to go to bed on time.
I only allow myself to dismiss the alarm when I lay down in bed. Before laying down I am only allowed to snooze it for 8 minutes. To dismiss the alarm I need to solve a puzzle which takes 10s, making dismissing more convenient. Make sure to carry your phone around wi...
I don’t think we are that far off from turning a book into a movie using AI. I just read H. P. Lovecraft’s At the Mountain of Madness, which is about discovering the remains of an ancient alien civilization in Antartica. It would make an amazing movie, and Guillermo del Toro has been trying to do it, but since such a project is so expensive, he hasn’t been able to put the pieces together. An AI could make movies that just wouldn’t get made otherwise. And maybe it could make one for me on the fly. I could tell it that I love the style of The Road with elements of horror, and it would produce the movie just for me.
That seems about right to me.
** Progress Report: AI Safety Fundamentals Project ** This is a public space for me to keep updated on my AI safety fundamentals project. The project will take 4 weeks. My goal is to stay lean and limit my scope so I can actually finish on time. I aim to update this post at least once per week with my updates, but maybe more often.
Overall, I want to work on agent foundations and the theory behind AI alignment agendas. One stepping point for this is Selection theorems; a research program to find justifications that a given training process will result in a ...
Well, haven't got much done in the last 2 weeks. Life has gotten in the way, and in the times where I thought I actually had the time and headspace to work on the project, things happened like my shoulder got injured playing sport, and my laptop mysteriously died.
But I have managed to create a github repo, and read the original posts on selection theorems. My list of selection theorems to summarize has grown. Check out the github page: https://github.com/jack-obrien/selection-theorems-review
Tonight I will try to do at least an hour of solid work on it. I w...
About 1 year ago, I wrote up a ready-to-go plan for AI safety focused on current science (what we roughly know how to do right now). This is targeting reducing catastrophic risks from the point when we have transformatively powerful AIs (e.g. AIs similarly capable to humans).
I never finished this doc, and it is now considerably out of date relative to how I currently think about what should happen, but I still think it might be helpful to share.
Here is the doc. I don't particularly want to recommend people read this doc, but it is possible that someone wil...
I disagree with the claim near the end that this seems better than Stop
At the start of the doc, I say:
It’s plausible that the optimal approach for the AI lab is to delay training the model and wait for additional safety progress. However, we’ll assume the situation is roughly: there is a large amount of institutional will to implement this plan, but we can only tolerate so much delay. In practice, it’s unclear if there will be sufficient institutional will to faithfully implement this proposal.
Towards the end of the doc I say:
...This plan requires qu
Some cities have dedicated LW/ACX Discord servers, which is pretty neat. Many of the cities hosting meetups over the next month are too small to have much traffic to such a server, were it set up. A combined, LW meetup oriented Discord server for all the smaller cities in the world, with channels for each city and a few channels for common small-meetup concerns, seems like a $20 bill on the sidewalk. So I’m checking whether such a thing exists here, before I start it.
Thanks! Just what I was looking for.
Some time back, Julia Wise published the results of a survey asking parents what they had expected parenthood to be like and to what extent their experience matched those expectations. I found those results really interesting and have often referred to them in conversation, and they were also useful to me when I was thinking about whether I wanted to have children myself.
However, that survey was based on only 12 people's responses, so I thought it would be valuable to get more data. So I'm replicating Julia's survey, with a few optional quantitative questi...
Oh oops, it wasn't. Fixed, thanks for pointing it out.
When thinking about deontology and consequentialism in application, I was trying to rate morality of actions based on intention, execution, and outcome. (Some cells are "na" as they are not really logical in real world scenarios.)
In reality, to me, it seems executed "some" intention matters (though I am not sure how much) the most when doing something bad, and executed to the best ability matters the most when doing something good.
It also seems useful to me, when we try to learn about applications of philosophy from law. (I am not an expert though in neith...
I think this is conditioning on one problem with one goal, but I haven’t thought about the other good collectively (more of a discussion on consequentialism).
For best of personal ability, I think the purpose is to distinguish what one can do personally, and what one can do to engage collaboratively/collectively, but I need to think through that better it seems, so that is a good question.
My reason on the na is: have intention, no execution/enough execution, did good is more like an accident, which is the same with no intention, no execution, and did good.
In some ways it doesn't make a lot of sense to think about an LLM as being or not being a general reasoner. It's fundamentally producing a distribution over outputs, and some of those outputs will correspond to correct reasoning and some of them won't. They're both always present (though sometimes a correct or incorrect response will be by far the most likely).
A recent tweet from Subbarao Kambhampati looked at whether an LLM can do simple planning about how to stack blocks, specifically: 'I have a block named C on top of a block named A. A is on tabl...
(I'll note that my timeline is both quite uncertain and potentially unstable - so I'm not sure how different it is from Jacob's, everything considered; but yup, that's roughly my model.)
I saw this in Xixidu's feed:
The article has a lot of information about the information processing rate of humans. Worth reading. But I think the article is equating two different things:
- The information processing capacity (of the brain; gigabits) is related to the complexity of the environment in whi
... (read more)