LESSWRONG
LW

1008
Sheikh Abdur Raheem Ali
344Ω921560
Message
Dialogue
Subscribe

Software Engineer (formerly) at Microsoft who may focus on the alignment problem for the rest of his life (please bet on the prediction market here).

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
How To Become A Mechanistic Interpretability Researcher
Sheikh Abdur Raheem Ali11d10

Thank you for sharing this guide. I'm trying to understand how much we know about the typical thought process that is generating some of the common mistakes. I can't speak to the specific motivations or goals of any individual in particular, but I'd speculate that if smart people are consistently appearing to make the same errors then there may be something more interesting going on that we can learn from.

I agree that avoiding compute-heavy steps is a good idea for those without a lot of prior ML experience. Even if you have (or expect to acquire) the resources to afford investing in a large training run, not knowing what you're doing almost always incurs a significant cost overhead, and the long iteration cycles tend to bottleneck the number of experiments you can run during a sprint. However everyone knows that big GPU clusters can be quite challenging to work with from an engineering perspective and so experience doing e.g multi-gpu SFT tends to be helpful for developing tacit knowledge and skillsets which are highly sought after in industry roles. [1]

It's less clear to me why someone would try to build on a highly technical method when they don't meet the prerequisites to fully understand a paper's approach and limitations. It could be driven by higher than average levels of self-belief and risk-tolerance, since some level of overconfidence can lead to better outcomes and faster growth than perfect calibration. The people who are equipped to properly evaluate and review complex work tend to be in short supply, but are disproportionately responsible for the most popular works, and it seems reasonable for someone who derives inspiration from a certain research direction to be naively excited about contributing to it. There's a power law distribution in the public attention that each paper receives, with a tendency for more eyeballs to be placed on works that push the envelope of what's possible in the field, which contradicts the intuition that rarity and prevalence should be inversely proportional. 

It would be understandable if people who are primarily consumers of good solutions to hard technical problems have a tendency to underestimate how easy it is to generate them. And the best attempt of someone whose foundation isn't quite there yet can look like cargo culting on surface level features instead of a reasonable extension to prior work. But I'm not satisfied with this explanation and would be interested in hearing other perspectives on why people tend to become susceptible to this category of errors.

  1. ^

    One possible factor could be that in certain circles taking a pile of cash and setting it on fire makes you cool because it shows that you can do things which cost a lot of money, but thankfully the vast majority of researchers I know are quite responsible and strive to minimize waste, so I don't think that's what's going on here. I do think it means we should be careful to mentally separate "startup founder with access to impressive million dollar cluster" from "person that is qualified to run and debug jobs on impressive million dollar cluster".

Reply
Parker Conley's Shortform
Sheikh Abdur Raheem Ali12d10

Don’t do this. Meditating 6 hours a day is excessive, unless you derive some marginal value that I don’t understand from it.

Reply
[Anthropic] A hacker used Claude Code to automate ransomware
Sheikh Abdur Raheem Ali16d0-6

I am sorry, but according to my current understanding, your proposal may be illegal and potentially harmful. I reported this comment for moderator review.

Reply
Open Global Investment as a Governance Model for AGI
Sheikh Abdur Raheem Ali18d10

I think that if you have a knack for ordinary software development, one application of that is to work at a tech company whose product already has or eventually obtains widespread adoption. This provides you with a platform where there is a straightforward path towards helping improve the lives of hundreds of millions of people worldwide by a small amount. Claude has around ~20-50 million monthly active users, and for most users it appears to be beneficial overall, so I believe that this criterion is met by Anthropic.

If you capture a small fraction of the value that you generate as a competent member of a reasonably effective team, then that often leads to substantial financial returns, and I think this is fair since the skillset and focus required to successfully plan and execute on such projects is quite rare. The bar for technical hires at a frontier lab is highly competitive, which commands equally competitive compensation in a market economy. You almost certainly had to clear a relatively higher bar (though one with less legible criteria) to be invited as an early investor. Capital appreciation is the standard reward for backing the production of a reliable and valuable service that others depend upon.

If you buy into the opportunity in AGI deployment, even the lower bounds of mundane utility can be one of the most leveraged ways to do good in the world. Given the dangers of ASI development, improvements to the safety and alignment of AGI systems can prevent profound harm, and the importance of this cannot be understated. Even in the counterfactual scenario where Anthropic was never founded, the urgency of such work would still be critical. There is some established precedent for handling a profitable industry with negative externalities (tobacco, petroleum, advertising) and it would be consistent to include the semiconductor industry in this category. I agree that existing frameworks are insufficient for making reasonable decisions about catastrophic risks. These worries have shaped my career working in AI safety, and a majority of the people here share your concerns.

However, I'm uncertain whether vilifying any small group of people would be the right move to achieve the strategic goals of the AI safety community. For example, Igor Babushchkin's recent transition from xAI to Babuschkin Ventures could have been complicated by an attitude of hostility towards the founders and early investors of AGI companies. Since nuanced communication doesn't work at scale, adopting this as our public position might inadvertently increase the likelihood of pivotal acts being committed by rogue threat actors, with inevitable media backlash identifying rationalist/EA people as culpable for "publishing radicalizing material". But taken seriously, that would be a fully general argument against the distribution of online material warning of existential risks from advanced AI, and being dumb enough to be vulnerable to making that sort of error tends to exclude you from being in positions where your failures can cause any real damage, so I think my real contention with such objections is not on strategy, but on principle. 

I'd be much more comfortable with accountability falling to the level of the faceless corporate entity rather than on individual members of the organization, because even senior employees with a lot of influence on paper might have limited agency in carrying out the demands of their role, and I think it would be best to follow the convention set by criticism such as Anthropic is Quietly Backpedalling on its Safety Commitments and ryan_greenblatt's Shortform which doesn't single out executives or researchers as responsible for the behavior of the system as a whole. 

I have made exceptions to this rule in the past, but it's almost always degraded the quality of the discussion. When asked about my opinion on this essay Dario Amodei — The Urgency of Interpretability at an AI Safety Social, I said that I thought it was hypocritical since a layoff at Anthropic UK had affected the three staff comprising their entire London interpretability team, which contradicts the top-level takeaway that labs should invest in interpretability efforts since if that was what was happening then you'd ideally be growing headcount on those teams instead of letting people go. But it's entirely possible Dario had no knowledge of this when writing the article, or that the hiring budget was reallocated to the U.S branch of the interp team, or even that offering relocation to other positions at the company wasn't practical for boring-and-complex H.R/accounting reasons. It doesn't seem like the pace of great interpretability research coming out of Anthropic has slowed down, so they're clearly still invested in it as a company. My hypothesis is that the extremely high financial returns are more of a side effect of operating at that caliber of performance instead of serving as a primary motivator for talent. If they didn't get rich from Anthropic, they'd get rich at a hedge fund or startup. The stacks of cash are not the issue here. The ambiguous future of the lightcone is.

It's possible that investors might be more driven by money, but I have less experience talking to them or watching how they work behind the scenes so I can't claim to know much about what makes them tick.

[This comment is no longer endorsed by its author]Reply
peterbarnett's Shortform
Sheikh Abdur Raheem Ali19d50

Nitter version of that thread: https://nitter.net/ESYudkowsky/status/1660623336567889920

I'm curious about the following line (especially in relation to a recent post, https://www.lesswrong.com/posts/vqfT5QCWa66gsfziB/a-phylogeny-of-agents)

If you tell me that the aliens grew out of scaled-up ant colonies, my probability that they're nice drops a lot (though not to 0%).

Why are scaled-up ant colonies unlikely to be nice? 

Reply
N Dimensional Interactive Scatter Plot (ndisp)
Sheikh Abdur Raheem Ali1mo10

Sure, I sent you the file over discord.

Reply1
My notes from "Energy: A Beginner's Guide"
Sheikh Abdur Raheem Ali1mo20

These notes are great, thanks for sharing them!

Reply
ABSOLUTE POWER (A short story)
Sheikh Abdur Raheem Ali1mo10

Playing back keystrokes is fairly trivial. You can do it via hammerspoon on mac or autohotkey on windows. Superwhisper even has an option to simulate keypresses which works on apps that restrict usage of the clipboard.

Reply
Thought Anchors: Which LLM Reasoning Steps Matter?
Sheikh Abdur Raheem Ali1mo10

For finding receiver heads, why do you use the kurtosis of the vertical attention scores across all layers?

Reply
Anthropic Lets Claude Opus 4 & 4.1 End Conversations
Sheikh Abdur Raheem Ali1mo31

I am also strongly in favor of model welfare. I think that this feature is great and everyone should copy it. 

For the question of critical services, I would hope that we don't put AIs at current capability levels in charge of decisions important enough that ending the chat has a significant impact on operations, but it's true that Claude is being integrated in applications from healthcare to defense and that isn't likely to stop soon. I will note that I believe that this ability is currently only implemented for claude.ai and it's an open question whether it will be introduced in the API or in Claude for Government. 

My background is in security and compliance where a common workflow is sending a request up the chain for elevated permissions which is reviewed by a senior manager before its execution is approved. This oversight prevents accidental footguns and allows us to audit usage logs of the system to help ensure transparency and accountability across the board. If the model is concerned about potential integrity violations, it can file a report for further investigation using the same confidential channels as employees, e.g https://www.microsoft.com/en-us/legal/compliance/sbc/report-a-concern. There are some limitations of this approach, but overall, I think that it works well.

Reply
Load More
2Sheikh Abdur Raheem Ali's Shortform
3y
19
2Sheikh Abdur Raheem Ali's Shortform
3y
19