In this post, I proclaim/endorse forum participation (aka commenting) as a productive research strategy that I've managed to stumble upon, and recommend it to others (at least to try). Note that this is different from saying that forum/blog posts are a good way for a research community to communicate. It's about individually doing better as researchers.
This is my personal opinion, and in particular, does not represent anything like a MIRI consensus; I've gotten push-back from almost everyone I've spoken with about this, although in most cases I believe I eventually convinced them of the narrow terminological point I'm making.
In the AI x-risk community, I think there is a tendency to ask people to estimate "time to AGI" when what is meant is really something more like "time to doom" (or, better, point-of-no-return). For about a year, I've been answering this question "zero" when asked.
This strikes some people as absurd or at best misleading. I disagree.
The term "Artificial General Intelligence" (AGI) was coined in the early 00s, to contrast with the prevalent paradigm of Narrow AI. I was getting my undergraduate computer science...
Yes, I agree. Whenever I think of things like this I focus on how what matters in the sense of "when will agi be transformational" is the idea of criticality.
I have written on it earlier but the simple idea is that our human world changes rapidly when AI capabilities in some way lead to more AI capabilities at a fast rate.
Like this whole "is this AGI" thing is totally irrelevant, all that matters is criticality. You can imagine subhuman systems using AGI reaching criticality, and superhuman systems being needed.
There are many forms of criticality, and th...
previously: https://www.lesswrong.com/posts/h6kChrecznGD4ikqv/increasing-iq-is-trivial
I don't know to what degree this will wind up being a constraint. But given that many of the things that help in this domain have independent lines of evidence for benefit it seems worth collecting.
Food:
dark chocolate, beets, blueberries, fish, eggs. I've had good effects with strong hibiscus and mint tea (both vasodilators).
Exercise:
Regular cardio, stretching/yoga, going for daily walks.
Learning:
Meditation, math, music, enjoyable hobbies with a learning component.
Light therapy:
Unknown effect size, but increasingly cheap to test over the last few years. I was able to get Too Many lumens for under $50. Sun exposure has a larger effect size here, so exercising outside is helpful.
Cold exposure:
this might mostly just be exercise for the circulation system, but cold showers might also have some unique effects.
Chewing on things:
Increasing blood...
The subtext here seems to be that such references are required. I disagree that it should be.
It is frequently helpful but also often a pain to dig up, so there are tradeoffs at play. For this post, I think it was fine to omit references. I don't think the references would add much value for most readers and I suspect Romeo wouldn't have found it worthwhile to post if he had to dig up all of the references before being able to post.
In January, I defended my PhD thesis, which I called Algorithmic Bayesian Epistemology. From the preface:
For me as for most students, college was a time of exploration. I took many classes, read many academic and non-academic works, and tried my hand at a few research projects. Early in graduate school, I noticed a strong commonality among the questions that I had found particularly fascinating: most of them involved reasoning about knowledge, information, or uncertainty under constraints. I decided that this cluster of problems would be my primary academic focus. I settled on calling the cluster algorithmic Bayesian epistemology: all of the questions I was thinking about involved applying the "algorithmic lens" of theoretical computer science to problems of Bayesian epistemology.
Although my interest in mathematical reasoning about uncertainty...
Thanks! Here are some brief responses:
From the high level summary here it sounds like you're offloading the task of aggregation to the forecasters themselves. It's odd to me that you're describing this as arbitrage.
Here's what I say about this anticipated objection in the thesis:
...For many reasons, the expert may wish to make arbitrage impossible. First, the principal may wish to know whether the experts are in agreement: if they are not, for instance, the principal may want to elicit opinions from more experts. If the experts collude to report an aggregate
The following is an example of how if one assumes that an AI (in this case autoregressive LLM) has "feelings", "qualia", "emotions", whatever, it can be unclear whether it is experiencing something more like pain or something more like pleasure in some settings, even quite simple settings which already happen a lot with existing LLMs. This dilemma is part of the reason why I think AI suffering/happiness philosophy is very hard and we most probably won't be able to solve it.
Consider the two following scenarios:
Scenario A: An LLM is asked a complicated question and answers it eagerly.
Scenario B: A user insults an LLM and it responds.
For the sake of simplicity, let's say that the LLM is an autoregressive transformer with no RLHF (I personally think that the...
I quality-downvoted it for being silly, but agree-upvoted it because AFAICT that string does indeed contain all the (lowercase) letters of the English alphabet.
This is the eighth post in my series on Anthropics. The previous one is Lessons from Failed Attempts to Model Sleeping Beauty Problem. The next one is Beauty and the Bets.
Suppose we take the insights from the previous post, and directly try to construct a model for the Sleeping Beauty problem based on them.
We expect a halfer model, so
On the other hand, in order not repeat Lewis' Model's mistakes:
But both of these statements can only be true if
And, therefore, apparently, has to be zero, which sounds obviously wrong. Surely the Beauty can be awaken on Tuesday!
At this point, I think, you wouldn't be surprised, if I tell you that there are philosophers who are eager to bite this bullet and claim that the Beauty should, indeed, reason as...
The link I use to get here only loads the comments, so I didn't find the "Effects of Amnesia" section until just now. Editing it:
"But in my two-coin case, the subject is well aware about the setting of the experiment. She knows that her awakening was based on the current state of the coins. It is derived from, but not necessarily the same as, the result of flipping them. She only knows that this wakening was based on their current state, not a state that either precedes or follows from another. And her memory loss prevents her from making any connection between the two. As a good Bayesian, she has to use only the relevant available information that can be applied to the current state."
Dictionary/SAE learning on model activations is bad as anomaly detection because you need to train the dictionary on a dataset, which means you needed the anomaly to be in the training set.
How to do dictionary learning without a dataset? One possibility is to use uncertainty-estimation-like techniques to detect when the model "thinks its on-distribution" for randomly sampled activations.
Lots of people already know about Scott Alexander/ACX/SSC, but I think that crossposting to LW is unusually valuable in this particular case, since lots of people were waiting for a big schelling-point overview of the 15-hour Rootclaim Lab Leak debate, and unlike LW, ACX's comment section is a massive vote-less swamp that lags the entire page and gives everyone equal status.
It remains unclear whether commenting there is worth your time if you think you have something worth saying, since there's no sorting, only sifting, implying that it attracts small numbers of sifters instead of large numbers of people who expect sorting.
Here are the first 11 paragraphs:
...Saar Wilf is an ex-Israeli entrepreneur. Since 2016, he’s been developing a new form of reasoning, meant to transcend normal human bias.
His
My current initial impression is that this debate format was not fit for purpose: https://www.astralcodexten.com/p/practically-a-book-review-rootclaim/comment/52659890
Summary: The post describes a method that allows us to use an untrustworthy optimizer to find satisficing outputs.
Acknowledgements: Thanks to Benjamin Kolb (@benjaminko), Jobst Heitzig (@Jobst Heitzig) and Thomas Kehrenberg (@Thomas Kehrenberg) for many helpful comments.
Imagine you have black-box access to a powerful but untrustworthy optimizing system, the Oracle. What do I mean by "powerful but untrustworthy"? I mean that, when you give an objective function as input to the Oracle, it will output an element that has an impressively low[1] value of . But sadly, you don't have any guarantee that it will output the optimal element and e.g. not one that's also chosen for a different purpose (which might be dangerous for many reasons, e.g. instrumental convergence).
What questions can you safely ask the Oracle? Can you use it to...
The threat model here seems basically wrong and focused on sins of commission when sins of omission are, if anything, an even larger space of threats and which apply to 'safe' solutions reported by the Oracle.
Sure, I mostly agree with the distinction you're making here between "sins of commission" and "sins of omissions". Contrary to you, though, I believe that getting rid of the threat of "sins of commission" is extremely useful. If the output of the Oracle is just optimized to fulfill your satisfaction goal and not for anything else, you've basically got...