In this post, I proclaim/endorse forum participation (aka commenting) as a productive research strategy that I've managed to stumble upon, and recommend it to others (at least to try). Note that this is different from saying that forum/blog posts are a good way for a research community to communicate. It's about individually doing better as researchers.
I have a concept that I expect to take off in reinforcement learning. I don't have time to test it right now, though hopefully I'd find time later. Until then, I want to put it out here, either as inspiration for others, or as a "called it"/prediction, or as a way to hear critique/about similar projects others might have made:
Reinforcement learning is currently trying to do stuff like learning to model the sum of their future rewards, e.g. expectations using V, A and Q functions for many algorithm, or the entire probability distribution in algorithms like ...
Please don’t feel like you “won’t be welcome” just because you’re new to ACX/EA or demographically different from the average attendee. You'll be fine!
Exact location: https://plus.codes/8CCGPRJW+V8
We meet on top of a small hill East of the Linha d'Água café in Jardim Amália Rodrigues. For comfort, bring sunglasses and a blanket to sit on. There is some natural shade. Also, it can get quite windy, so bring a jacket.
(Location might change due to weather)
[This is part of a series I’m writing on how to convince a person that AI risk is worth paying attention to.]
tl;dr: People’s default reaction to politics is not taking them seriously. They could center their entire personality on their political beliefs, and still not take them seriously. To get them to take you seriously, the quickest way is to make your words as unpolitical-seeming as possible.
I’m a high school student in France. Politics in France are interesting because they’re in a confusing superposition. One second, you'll have bourgeois intellectuals sipping red wine from their Paris apartment writing essays with dubious sexual innuendos on the deep-running dynamics of power. The next, 400 farmers will vaguely agree with the sentiment and dump 20 tons of horse manure in downtown...
Interesting, and very well written. Because you have access to particularly funny examples, you show very well how much politics is an empty status game.
I should probably point out that five years ago, I was a high school student in France, felt more or less the way you do, and went on to study political science at college (I don’t even need to say which college I’m talking about, do I?). It is a deep truth that politics is very unserious for most people, and that is perhaps most true for first-year political science students (or, god forbid, the sor...
You may be able to notice data points where the SAE performs unusually badly at reconstruction? (Which is what you'd see if there's a crucial missing feature)
Cross-posted on the EA Forum. This article is the fourth in a series of ~10 posts comprising a 2024 State of the AI Regulatory Landscape Review, conducted by the Governance Recommendations Research Program at Convergence Analysis. Each post will cover a specific domain of AI governance (e.g. incident reporting, safety evals, model registries, etc.). We’ll provide an overview of existing regulations, focusing on the US, EU, and China as the leading governmental bodies currently developing AI legislation. Additionally, we’ll discuss the relevant context behind each domain and conduct a short analysis.
This series is intended to be a primer for policymakers, researchers, and individuals seeking to develop a high-level overview of the current AI governance space. We’ll publish individual posts on our website and release a comprehensive report at the end of this...
An entry-level characterization of some types of guy in decision theory, and in real life, interspersed with short stories about them
A concave function bends down. A convex function bends up. A linear function does neither.
A utility function is just a function that says how good different outcomes are. They describe an agent's preferences. Different agents have different utility functions.
Usually, a utility function assigns scores to outcomes or histories, but in article we'll define a sort of utility function that takes the quantity of resources that the agent has control over, and the utility function says how good an outcome the agent could attain using that quantity of resources.
In that sense, a concave agent values resources less the more that it has, eventually barely wanting more resources at...
Or the sides can't make that deal because one side or both wouldn't hold up their end of the bargain. Or they would, but they can't prove it. Once the coin lands, the losing side has no reason to follow it other than TDT. And TDT only works if the other side can reliably predict their actions.
Summary: The post describes a method that allows us to use an untrustworthy optimizer to find satisficing outputs.
Acknowledgements: Thanks to Benjamin Kolb (@benjaminko), Jobst Heitzig (@Jobst Heitzig) and Thomas Kehrenberg (@Thomas Kehrenberg) for many helpful comments.
Imagine you have black-box access to a powerful but untrustworthy optimizing system, the Oracle. What do I mean by "powerful but untrustworthy"? I mean that, when you give an objective function as input to the Oracle, it will output an element that has an impressively low[1] value of . But sadly, you don't have any guarantee that it will output the optimal element and e.g. not one that's also chosen for a different purpose (which might be dangerous for many reasons, e.g. instrumental convergence).
What questions can you safely ask the Oracle? Can you use it to...
If the oracle is deceptively withholding answers, give up on using it. I had taken the description to imply that the oracle wasn't doing that.