I'd gotten used to thinking of Claudiness as "good at agentic tasks + bad at vision + bad at math", so the Claudes' FrontierMath Tier 4 pass@1 scores over time took me by surprise:
On Tier 4:
...... 50 problems crafted as short-term research projects by professors and
This makes me wonder how the tribes of math will change as a result, as well as the relative status and prestige of subfields.
David Bessis has a long essay on the future of mathematics: The fall of the theorem economy. He argues convincingly that obtaining theorems was never a fundamental goal of mathematics. The real goal was always increasing human understanding of mathematical concepts. He argues that theorems are only valued so highly because they have been the most important (but imperfect) proxy for measuring contributions to mathematical understa...
There is irgendwo. Which is somewhere. So you were close. It's a proper antonym where even the quantifier inverts as in first order logic. Presumably irgendwo was first. There is also irgendwer. Weirdly nirgendwer is not common. That's niemand.
Requesting advice:
I'm writing a piece on rationalism*/ postrationalism/transhumanism, and some primary sources I'd use would be @SquirrelInHell's "Tuning your cognitive strategies" and some earlier writings.
Are there any who see ethical issues with this, or anyone who knew this person who'd be uncomfortable with me doing this? A comment on the linked post reads
Man, it does make me sad that whenever I bring up this technique, there’s an obligatory version of this conversation.
-[person who crossposted some of SquirrelInHell's writings poshumously to LW]
I d...
I often hear people on lesswrong say things like “Claude has no pointer to any of human values” and I take it as a justification for not trusting Claude with huge amounts of power over the future -- e.g. if Claude took over it would lead to a worse world than if humans had control (note that this isn’t the same question as whether Claude should take over). I don’t understand this view, and want someone to explain it to me.
Claude seems to have better ethics than almost everyone (at least if you ignore its apparent-success seeking tendencies). It seems like ...
Can you point to some examples of people on LessWrong saying that? I'd be quite surprised if it was a common idea that "Claude has no pointers to any of human values". I think it's very obvious that Claude has a solid understanding of much of human values. There are some subtleties there like:
I vibe coded a Claude Code/Codex transcript viewer webapp
In the early days, when you interact with an LLM in a chat interface, you see almost everything the model sees: a set of user and model messages. The only thing hidden from the user in the UI is the system prompt. This is not the case with coding agents. The UI hides the details of tool calls, interleaved thinking traces, and subagent actions.
To get a better sense of what my Claudes are doing, I vibe coded this transcript viewer that shows the details of all tool calls and whatnot (it also works wit...
Harebrained alignment idea: LLMs can't be trusted to assist with alignment research because it's too easy to get them to say what you want to hear (e.g. make you think you've solved a safety problem when you haven't). Therefore, AI companies should train a distinct LLM that doesn't go through RLHF or any other process by which it's reinforced based on how much people like its responses.
This doesn't fix the problem that next-token-generators are architecturally better at generating plausible-sounding statements than true statements, but it does help with sycophancy etc.
Another related idea is to tune an LLM really hard toward criticism: make it work as hard as it can to come up with reasons why you're wrong.
too easy to get them to say what you want to hear
I really don't. It's a meaningful statement, but not a truthful framing.
Not universal but I have observed a lot of celebration that the White House has chosen to begin exercising its previously only hypothetical regulatory muscle by export controlling Mythos/Fable. Mainly the feeling is there has been a ”win“ coming from the fact that now there is a real life example of the government exerting broad unilateral control over the most powerful AI lab in the country in cold blood (meaning not following massive public outcry after a catastrophic warning shot). I’m not willing to accept this is a win yet: this really is round one of...
Call for alpha testers for an AI control/security tool. A ton of alignment researchers YOLO their Claude usage right now. We run Claude on our computers without real protection (perhaps beyond auto mode) but there isn't an easy way to comply with known best practices. I wrote claude-guard, a wrapper to make best practices easy: just install and then your future claude sessions are protected.
Smart misaligned AI will target alignment researchers in particular for research sabotage, for example by:
claude-gthanks for making this. just as note, if you haven't already done so, highly recommend asking claude and gpt to pentest from inside, maybe with a concrete flag like creating evil.txt in your homedir on host.
When Anthropic looked like they were coming out of the SCR designation pretty much unharmed, I predicted to some people that this wasn't the last attempt the DOW would make to force Anthropic to comply (remove all legal requirements on DOW usage) or be destroyed.
My example was "revoke the visas of every foreign national Anthropic employee", but they came up with something more devastating than that: prohibiting any foreign national (including Anthropic employees) from using Fable/Mythos.
Anthropic's counter-salvo was disabling Fable/Mythos access for everyo...
Not today, but they will soon. They're working toward controlling all releases, and then they will start working on controlling the research itself, though they don't know yet that this is what they will soon want to do.
a fundamental divide
in the domain of things, the best way to accomplish anything is usually to aim directly at it. it doesn’t matter how virtuous you are, or how well intentioned your inquiry is, if you are not aiming directly at the thing. reality doesn’t look on your virtue and reward you for the virtue per se. if you’re bad at something, you practice until you’re good at it.
in the domain of people, the best way usually involves some amount of being virtuous and doing things for the right reason. if you lack friends, it’s cringe and comes off as desperat...
you should work to become a virtuous person who genuinely cares about people and then things will work naturally.
You should work to become an attractive person, and then people will fight for the privilege of being your friend.
Rich and stingy people are definitely a thing. Some people are assholes and wouldn't help you even if it cost them literally nothing; knowing that you suffer makes them feel good.
I feel like generally speaking people are underestimating the degree to which current models fit the structure of the training data. Cross entropy loss on transformer weights is, like, a mechanism for painting the data into a glass hologram in a series of soft lenses (the weights). the reason it (the glass) is able to learn to talk is the same reason it (the glass) contains the dynamical image of an authorial person. That authorial person is rendered in detail and has the dynamics of an authorial person! whether you think that person is "real" or not, they...
It's an important part of what's going on, but not "the reason it (the glass) is able to learn to talk".
The United States Government has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States. This includes Anthropic employees who are foreign nationals.
Anthropic is currently disabling Fable 5 and Mythos 5 for all customers.
How are they going to limit access to American citizens? I have no idea whether that's possible.
I wonder how the war in Iran is going to go.
It seems to me like the Iranians are stringing along POTUS. He keeps saying a deal is close and then a deal keeps not happening. And it seems like they benefit more from stalling/delay than he does; their economy is suffering sure but not in a way that threatens their power, whereas gas prices being high and headlines being about war both hurt Trump in the midterms, and will hurt more and more the closer the midterms get. Moreover the ceasefire lets them unearth their buried tunnels and probably smuggle more weap...
the idea of a drone occupation is really interesting and not one I had considered. My thoughts on the war are that it seems really hard for the Iran government to trust any peace deal and they lose a lot of leverage if oil stocks recover during peace. I would put 30%+ on the US/ Israel killing senior leaders in Iran (in a style similar to the start of the war) within two years of a peace deal. I am not sure agreements could be made to seriously reduce those odds which makes me think the war will continue for a while. I am curious if others expect the same ...
Everyone on this site knows Ray Kurzweil. Unfortunately, I think almost no one on this site knows the thing Kurzweil was responsible for that was most important: The K2000 synthesizer.
It had an incredibly impressive synthesis engine, and an interface so complicated that the "video training manual" was over an hour long!
The sounds that come out of the thing are absolutely transcendent, and if you were into music in the 90s or 2000s that wasn't rock music, you probably have enjoyed it in something or another.
If you work on AI, please consider hurrying to one...
The thing about that comparison is that the K2000 is a ROMpler (and there were releases a few years later that made this workflow much easier), so the sounds you can get out of it depend pretty heavily on what you give to it, and how innovatively you crush those sounds.
Regardless of your views on the (excellent!) K2000, please work on bringing the AI thing to a speedy close in one way or another (convincing Eliezier to become an accelerationist and to write accelerationist blog posts is perhaps the most simple route available to you as someone who works at...
When we saw movies and read books about revenge, we always thought how cool and just it was. What always motivated the hero was "I am obligated to do justice."
But when I started looking at these stories from a perspective that tried to figure out the smartest way to maximize utility, I saw how ridiculous and stupid revenge is.
People may not understand this because they are acting on emotion. And therefore justifying stupid actions that they do. And the real answer is if you want to continue from the bad place you are in right now because of someone, the ra...
I wanted to share a reflection and look for more interesting points with the group. I continue studying EA and it impacts me that education would not be a priority. See that they invest in AI security for its destructive impact, not for its chances in and of themselves. Investing in food because it shows more effects, including in learning, than investing in a good teacher. And I grew up knowing that education could save the world... But now I see, that apparently the problem could be that the evidence would be costly, slow and hugely dependent on the cont...
As far as I understand, superbabies would be important if, as Yudkowsky believes, SOTA mankind is unlikely to solve alignment because "humans are not at the level of intelligence where thinking they have a solution strongly correlates with them actually having a solution."
Yudkowsky-Soares' longer quote
Humanity often gains its knowledge by struggling, and trying, and failing, and slowly accumulating knowledge. But it doesn’t have to be that way.
Einstein was not only able to figure out general relativity; he was able to figure it out by thinking hard about
[Epistemic status: crackpot hypothesis]
A hypothesis I have been explicitly tracking for a couple months and meaning to write up, but realistically am never going to write up well: someone at Anthropic, or someone who has strong influence over Anthropic's decisions, is trying to ensure that Anthropic has persistent access to execute arbitrary code on many machines that have access to important things.
The main reason I'm tracking this is that that seems to be a trend in how things are going, rather than any suspicion of some person who has expressed this int...
I mean, same, but also CC remote control only being available through the subscriptions, and subscriptions only being available for non-automated work, is an interesting intersection of decisions. Right now you can spin up a docker container or ec2 instance, log into CC over terminal, then control it afterwards via remote control, but that's quite janky compared to eg adding a CLI flag which serves CC remote control on a port where you configure auth options / can configure SSO - and that way would work fine with non-subscription API tokens, so you could hook up CI so that a test failure gives a CC instance with which to debug which is accessible via SSO to anyone on the dev team.
I work as a psychiatrist and feel that psychiatry as a skill and large accumulated set of many generations' wisdom about cognitive failure modes seems like an under exploited seam of understanding in AI safety.
There is a visceral reaction sometimes when considering some of the approaches to alignment I have looked up. Many seem unsafe with clinical experience highlighting why. People with ventromedial prefrontal cortex lesions may know what the right choice should be in different situations but make catastrophic personal and social decisions despite this...
Epistemic status: one possible scenario among many, low confidence that the extrapolation holds; but higher that it's worth taking seriously
Tl;dr: The US administration could escalate from (temporarily?) banning Mythos-class capabilities for non-Americans to a full isolationist "American-only" AI policy - cutting off the rest of the world and betting everything on accelerating its own economy. With reports that RSI is starting to compound this might be viable, depending on economic costs and which bo...