Nate Soares reviews a dozen plans and proposals for making AI go well. He finds that almost none of them grapple with what he considers the core problem - capabilities will suddenly generalize way past training, but alignment won't.
Why do you focus on this particular guy? Tens of thousands of traders were cumulatively betting billions of dollars in this market. All of these traders faced the same incentives.
Note that it is not enough to assume that willingness to bet more money makes a trader worth paying more attention to. You need the stronger assumption that willingness to bet n times more than each of n traders makes the single trader worth paying more attention to than all the other traders combined. I haven’t thought much about this, but the assumption seems false to me.
Four months after my post 'LLM Generality is a Timeline Crux', new research on o1-preview should update us significantly toward LLMs being capable of general reasoning, and hence of scaling straight to AGI, and shorten our timeline estimates.
In June of 2024, I wrote a post, 'LLM Generality is a Timeline Crux', in which I argue that
Nice post!
Regarding o1 like models: I am still unsure how to draw the boundary between tasks that see a significant improvement with o1 style reasoning and tasks that do not. This paper sheds some light on the kinds of tasks that benefit from regular COT. However, even for mathematical tasks, which should benefit the most from CoT, o1-preview does not seem that much better than other models on extraordinarily difficult (and therefore OOD?) problems. I would love to see comparisons of o1 performance against other models in games like chess and Go.
Also...
Claim: memeticity in a scientific field is mostly determined, not by the most competent researchers in the field, but instead by roughly-median researchers. We’ll call this the “median researcher problem”.
Prototypical example: imagine a scientific field in which the large majority of practitioners have a very poor understanding of statistics, p-hacking, etc. Then lots of work in that field will be highly memetic despite trash statistics, blatant p-hacking, etc. Sure, the most competent people in the field may recognize the problems, but the median researchers don’t, and in aggregate it’s mostly the median researchers who spread the memes.
(Defending that claim isn’t really the main focus of this post, but a couple pieces of legible evidence which are weakly in favor:
I agree that lab leaders are not in much better position, I just think that lab leaders causally screen off influence of subordinates, while incentives in the system causally screens off lab leaders.
...The cleanest argument that current-day AI models will not cause a catastrophe is probably that they lack the capability to do so. However, as capabilities improve, we’ll need new tools for ensuring that AI models won’t cause a catastrophe even if we can’t rule out the capability. Anthropic’s Responsible Scaling Policy (RSP) categorizes levels of risk of AI systems into different AI Safety Levels (ASL), and each level has associated commitments aimed at mitigating the risks. Some of these commitments take the form of affirmative safety cases, which are structured arguments that the system is safe to deploy in a given environment. Unfortunately, it is not yet obvious how to make a safety case to rule out certain threats that arise once AIs have sophisticated strategic abilities. The goal
If we can construct fake honeypots that are indistinguishable - to the policy - from real opportunities, and do this at scale, then that could suffice: we could drive the policy's P(deployment | great opportunity) very low. (This relies on how many high-stakes we can create and how many the model will see in deployment.)
Yes totally. (This is impractical if you think that the model has a very large number of opportunities to attack, e.g. if every time it's programming is an opportunity to attack.)
...If we can construct fake attacks that are indistinguishable -
Here are several examples; I found these captchas via the web rather than generating them anew, but none of them came attached to solutions so I'm not sure their presence in the training data would affect things in any case. (That said, it's possible that the lower resolution of the latter two degraded the adversarial perturbation; I would appreciate a source of higher-resolution captchas if anyone happens to know one.)
CAPTCHAs have "adversarial perturbations"? Is that in the sense of "things not visible to humans, but specifically adversarial to deep learning networks"? I thought they just had a bunch of random noise and weird ad hoc patterns thrown over them.
Anyway, CAPTCHAs can't die soon enough. Although the fact that they persist in the face of multiple commercial services offering to solve 1000 for a dollar doesn't give me much hope...
There are two nuclear options for treating depression: Ketamine and TMS; This post is about the latter.
TMS stands for Transcranial Magnetic Stimulation. Basically, it fixes depression via magnets, which is about the second or third most magical things that magnets can do.
I don’t know a whole lot about the neuroscience - this post isn’t about the how or the why. It’s from the perspective of a patient, and it’s about the what.
What is it like to get TMS?
For Reasons™, doctors like to gatekeep access to treatments, and TMS is no different. To be eligible, you generally have to have tried multiple antidepressants for several years and had them not work or stop working. Keep in mind that, while safe, most antidepressants involve altering your brain chemistry...
This isn't directly related to TMS, but I've been trying to get an answer to this question for years, and maybe you have one.
When doing TMS, or any depression treatment, or any supplementation experiment, etc. it would make sense to track the effects objectively (in addition to, not as a replacement for subjective monitoring). I haven't found any particularly good option for this, especially if I want to self-administer it most days. Quantified mind comes close, but it's really hard to use their interface to construct a custom battery and an indefinite experiment.
Do you know of anything?
In my bioinformatics work I often stream files between linux hosts and Amazon S3. This could look like:
$ scp host:/path/to/file /dev/stdout | \ aws s3 cp - s3://bucket/path/to/file
This recently stopped working after upgrading:
ftruncate "/dev/stdout": Invalid argument Couldn't write to "/dev/stdout": Illegal seek
I think I figured out why this is happening:
New versions of scp
use the SFTP protocol instead of
the SCP protocol. [1]
With scp
I can give the -O
flag:
Use the legacy SCP protocol for file transfers instead of the SFTP protocol. Forcing the use of the SCP protocol may be necessary for servers that do not implement SFTP, for backwards-compatibility for particular filename wildcard patterns and for expanding paths with a '~' prefix for older SFTP servers.
This does work, but it doesn't seem ideal: probably servers will drop support for the SCP protocol at some point? I've filed a bug with OpenSSH.
[1] "man scp
" gives me: "Since OpenSSH 8.8 (8.7 in Red
Hat/Fedora builds), scp has used the SFTP protocol for transfers by
default."
Using scp to stdout looks weird to me no matter what. Why not
ssh -n host cat /path/to/file | weird-aws-stuff
... but do you really want to copy everything twice? Why not run weird-aws-stuff
on the remote host itself?
Yeah oops, meant long
Another potential benefit of this is that Anthropic might get more experience deploying their models in high-security environments.