I've been active in the meatspace rationality community for years, and have recently started posting regularly on LW. Most of my posts and comments are about AI and alignment.
Posts I'm most proud of, and / or which provide a good introduction to my worldview:
I also wrote a longer self-introduction here.
PMs and private feedback are always welcome.
Yeah, there are probably various methods you can use and precautions you can take to participate securely without leaking any significant information. But any such plans are necessarily more complicated and carry more risk of screwing them up than blanket abstention. As the importance of the secrets you have to keep goes up, the benefit of simplicity probably outweighs the benefits derived from participating in prediction markets, especially play money ones.
Where someone personally draws the line on whether to participate or not depends on: the relative value they place on keeping secrets, how many important secrets they have, how much benefit they derive from participating in prediction markets, and how much they enjoy implementing complicated information-theoretic secret-keeping schemes.
To add on to the point about information-theoretic secret-keeping, if Manifold, market participants, and people with secrets want to not leak info in the future, my advice for each group is:
engineer pandemics or conduct military operations could lead to severe derailment without satisfying this definition.
I think humans could already do those things pretty well without AI, if they wanted to. Narrow AI might make those things easier, possibly much easier, just like nukes and biotech research have in the past. I agree this increases the chance that things go "off the rails", but I think once you have an AI that can solve hard engineering problems in the real world like that, there's just not that much further to go to full-blown superintelligence, whether you call its precursor "narrow" or not.
The probabilities in my OP are mostly just a gut sense wild guess, but they're based on the intuition that it takes a really big derailment to halt frontier capabilities progress, which mostly happens in well-funded labs that have the resources and will to continue operating through pretty severe "turbulence" - economic depression, war, pandemics, restrictive regulation, etc. Even if new GPU manufacturing stops completely, there are already a lot of H100s and A100s lying around, and I expect that those are sufficient to get pretty far.
When insider trading happens in the stock market, it's usually not the exchange that is liable or responsible for doing anything about it.
If traders with inside information were trading in this market, I'd like to know how they came to know the information, and whether those who shared it agreed to abide by an embargo.
If someone who agreed to abide by an embargo directly traded in the market or knowingly shared information with someone who did, I would consider that an ethical breach. If the sharing was unintentional, I wouldn't consider it a breach of ethics, but I would question that person's ability to keep secrets of even mild importance in the future.
If insiders learned of the letter's existence through more tenuous or indirect links, I still think this is a failure of secret-keeping on the part of CAIS and the signatories, but it may be that there is no one person who is directly responsible. Still, we should strive for better secrecy when it matters, and remind everyone of the basics of information-theoretic secret keeping (act exactly as you would if you knew nothing, consider counterfactuals, etc.).
That's just mistakes how the human mind and human intelligence works. Our brain is not made to think in terms of probability.
I didn't intend to claim anything about how the brain or human intelligence works. Rather, I'm saying probability theory points at a correct way to reason for ideal agents, which humans can try to approximate. I expect approximations which involve thinking explicitly in terms of probabilities (not necessarily only in terms of probabilities) will tend to outperform approximations that don't.
Anyway, back to the object level: I would welcome more evidence on the question of aliens, but I personally don't feel that confused by current observations, and believe they are well-explained by higher prior probability hypotheses that do not involve aliens.
Perhaps the reason this post received some downvotes: it reads somewhat as a call for others to do expensive investigatory work and / or deductive thinking. Personally, I feel I've already done enough investigation and deduction on my own on this topic, and more (by myself or others) is probably not worth the effort.
Note, there's sometimes a tradeoff between gathering more facts and thinking longer to deduce more from the facts you already have. In this case, I think there's already more than enough evidence available for an ideal agent to conclude from a cursory inspection that the observed evidence is not well-explained by actual aliens. But you don't need to be an ideal agent to draw similar conclusions: you merely need to apply some effort and reasoning skills which are pretty common among LW readers, but not so common outside these circles (some of the skills I have in mind are those described by the bullet points in my reply here.)
A possible explanation: both brains and LLMs are somehow solving the symbol grounding problem. It may be that the most natural solutions to this problem share commonalities, or even that all solutions are necessarily isomorphic to each other.
Anyone who has played around with LLMs for a while can see that they are not just "stochastic parrots", but I think it's a pretty big leap to call anything within them "human-like" or "brain-like".
If an AI (perhaps a GOFAI or just an ordinary computer program) implements addition using the standard algorithm for multi-digit addition that humans learn in elementary school, does that make the AI human-like? Maybe a little, but it seems less misleading to say that the method itself is just a natural way of solving the same underlying problem. The fact that AIs are becoming capable of solving more complex problems that were previously only solvable by human brains seems more like a fact about a general increase in AI capabilities, than a result of AI systems getting more "brain-like".
To say that any system which solves a problem via similar methods to humans is brain-like, seems like it is unfairly privileging the specialness / uniqueness of the brain. Claims like that (IMO wrongly) suggestively imply that those solutions somehow "belong" to the brain, simply because that is where we first observed them.
I see, thanks for clarifying. I agree that it might be straightforward to catch bad behavior (e.g. deception), but I expect that RL methods will work by training away the ability of the system to deceive, rather than the desire.[1] So even if such training succeeds, in the sense that the system robustly behaves honestly, it will also no longer be human-level-ish, since humans are capable of being deceptive.
Maybe it is possible to create an AI system that is like the humans in the movie The Invention of Lying, but that seems difficult and fragile. In the movie, one guy discovers he can lie, and suddenly he can run roughshod over his entire civilization. The humans in the movie initially have no ability to lie, but once the main character discovers it, he immediately realizes its usefulness. The only thing that keeps other people from making the same realization is the fictional conceit of the movie.
Or, paraphrasing Nate: the ability to deceive is a consequence of understanding how the world works on a sufficiently deep level, so it's probably not something that can be trained away by RL, without also training away the ability to generalize at human levels entirely.
OTOH, if you could somehow imbue an innate desire to be honest into the system without affecting its capabilities, that might be more promising. But again, I don't think that's what SGD or current RL methods are actually doing. (Though it is hard to be sure, in part because no current AI systems appear to exhibit desires or inner motivations of any kind. I think attempts to analogize the workings of such systems to desires in humans and components in the brain are mostly spurious pattern-matching, but that's a different topic.)
In the words of Alex Turner, in RL, "reward chisels cognitive grooves into an agent". Rewarding non-deceptive behavior could thus chisel away the cognition capable of performing the deception, but that cognition might be what makes the system human-level in the first place.
Here's one plausible alternative explanation of the facts that I happened to come across on Twitter: https://twitter.com/erikphoel/status/1667197430197022722
While I would like to know whether or not aliens visited earth, I think it's more useful to simply take the stance "I don't know" instead of thinking in terms of probability.
You can't really "not think in terms of probability" by refusing to think about them explicitly. It can sometimes be a useful exercise to think about or write down everything in terms of numerical probabilities and likelihoods and then throw that away and "go with your gut", but if your beliefs are coherent they imply an underlying probabilistic model, whether you acknowledge it or not.
The mental moves of directly rounding down to "my priors against aliens are high" -> "no aliens" -> "no need to do anything" is bad as if enough people hold it we won't get more evidence.
Investigating the question of whether aliens have visited earth may be valuable enough to overcome the low prior. However, I predict in advance that congressional hearings are not going to yield much evidence on this question one way or the other, unless they're focused on investigating e.g. the specific claims in the tweet thread above. In general, these kinds of hearings are not known for their truth-seeking or fact-finding ability.
Probably not directly relevant to most of the post, but I think:
we don't know if "most" timelines are alive or dead from agentic AI, but we know that however many are dead, we couldn't have known about them. if every AI winter was actually a bunch of timelines dying, we wouldn't know.
Is probably false.
It might be the case that humans are reliably not capable of inventing catastrophic AGI without a certain large minimum amount of compute, experimentation, and researcher thinking time which we have not yet reached. A superintelligence (or smarter humans) could probably get much further much faster, but that's irrelevant in any worlds where higher-intelligence beings don't already exist.
With hindsight and an inside-view look at past trends, you can retro-dict what the past of most timelines in our neighborhood probably look like, and conclude that most of them have probably not yet destroyed themselves.
It may be that going forward this trend does not continue: I do think most timelines including our own are heading for doom in the near future, and it may be that the history of the surviving ones will be full of increasingly implausible development paths and miraculous coincidences. But I think the past is still easily explained without any weird coincidences if you take a gears-level look at the way SoTA AI systems actually work and how they were developed.
Another potential downside of this approach: it places a lot of constraints on the AI itself, which means it probably has to be strongly superintelligent to start working at all.
I think an important desiderata of any alignment plan is that your AI system starts working gradually, with a "capabilities dial" that you (and the aligned system itself) turn up just enough to save the world, and not more.
Intuitively, I feel like an aligned AGI should look kind of like a friendly superhero, whose superpower is weak superintelligence, superhuman ethics, and a morality which is as close as possible to the coherence-weighted + extrapolated average morality of all currently existing humans (probably not literally; I'm just trying to gesture at a general thing of averaging over collective extrapolated volition / morality / etc.).
Brought into existence, that superhero would then consider two broad classes of strategies:
From my own not-even-weakly superhuman vantage point, (2) seems like a much easier and less fraught strategy than (1). If I were a bit smarter, I'd try saving the world without AI or enhancing myself any further than I absolutely needed to.
Faced with the problem that the boxed AI in the QACI scheme is facing... :shrug:. I guess I'd try some self-enhancement followed by solving problems in (1), and then try writing code for a system that does (2) reliably. But it feels like I'd need to be a LOT smarter to even begin making progress.
Provably safely building the first "friendly superhero" might require solving some hard math and philosophical problems, for which QACI might be relevant or at least in the right general neighborhood. But that doesn't mean that the resulting system itself should be doing hard math or exotic philosophy. Here, I think the intuition of more optimistic AI researchers is actually right: an aligned human-ish level AI looks closer to something that is just really friendly and nice and helpful, and also super-smart.
(I haven't seen any plans for building such a system that don't seem totally doomed, but the goal itself still seems much less fraught than targeting strong superintelligence on the first try.)