Sheikh Abdur Raheem Ali

Software Engineer (formerly) at Microsoft who may focus on the alignment problem for the rest of his life (please bet on the prediction market here).

Wikitag Contributions

Comments

Sorted by

One percent of the world’s AI compute (LLM-grade GPU capacity) is in the UAE which does not have an AI Security Institute. I’ve planned to spend 6-9% of my bandwidth this month (2-3 days during May 2025) on encouraging the UAE to establish an AISI. Today is the first day.

However in my view even the most optimistic impact estimate of the successful execution of that plan doesn’t realistically lead to a greater than 2% shift in the prediction market of the UAE starting an AI Security Institute before 2026. Even if a UAE AISI existed, then it would not be allocated more than 1% to 5% (mode 2%) of the overall national AI budget (roughly $2b). Taking 2% of 2% of $2b is a maximum valuation of $800k for the entire project (I think that the median valuation would be significantly lower, but I’m not using the maximum here to be generous, but because I believe that — for this system — using the max value is more informative for decision making and easier to estimate than evaluating the 95th or 99.9th percentile value).

I was talking about this with my dad earlier, whose take was that attending the 1 day https://govaisummitmea.com/ on May 13th would be less than 0.01% of the work involved in actually pulling this thing off. My understanding of what he meant in more formal terms is that if your goal is for the UAE to have an AISI before 2026, and you then decompose each step of the plan to achieve that outcome into players in a Shapley value calculation, then acquiring these tickets has an average marginal contribution of at most 0.0001 times at most $800k, which is $80. And it would be foolish to pay the cost of one day of my time as well as tickets for me plus a collaborator when the return on that investment is — by this model — capped at $80.

Although my dad’s take here is reasonable given the information available to him, he doesn’t have the full context and the time he can allocate to learning about this project is restricted. Even though he is burdened by the demands of his various responsibilities, I’m grateful that supporting me in particular is one that he has always prioritized. I love Abba!

Here’s why Abba is wrong. There are 400 total seats. The cost is USD 1299 per head, so 2600 to register two attendees. At this price range it makes sense for a company to reimburse the fee to send representatives if it profits from building relationships with UAE policymakers. These will mostly be orgs not working directly on reducing AI x-risk. Although having 0.5% of attendees be alignment researchers is unlikely to affect the overall course of discussions, it is a counterweight which balances against worlds where this minority group has zero participation in these conversations. I think it may be as much as 3.25% of the work needed, which is ten times more than the floor break even point of 0.325% (2600/800k). But besides that, my team has been approved for up to 5 seats, we have a 10-page draft policy memo coauthored with faculty at https://mbrsg.ae/, and we can just ask for the registration fee to be waived. I agree that it would be insane to pay the full amount out of pocket. (Edit: we got the waiver request approved.)

Here’s why Abba is right. This post was written after midnight. Later today I will go to the Middle East and Africa’s largest cybersecurity event, https://gisec.ae/. I look forward to further comments and reviews.

Thread: Research Chat with Canadian AI Safety Institute Leadership

I’m scheduled to meet https://cifar.ca/bios/elissa-strome/ from Canada’s AISI for 30 mins on Jan 14 at the CIFAR office in MaRS.

My plan is to share alignment/interp research I’m excited about, then mention upcoming AI safety orgs and fellowships which may be good to invest in or collaborate with.

So far, I’ve asked for feedback and advice in a few Slack channels. I thought it may be valuable to get public comments or questions from people here as well.

Previously, Canada invested $240m into a capabilities startup: https://www.canada.ca/en/department-finance/news/2024/12/deputy-prime-minister-announces-240-million-for-cohere-to-scale-up-ai-compute-capacity.html. If your org has some presence in Toronto or Montreal, I’d love to have permission to give it a shoutout!

Elissa is the lady on the left in the second image from this article: https://cifar.ca/cifarnews/2024/12/12/nicolas-papernot-and-catherine-regis-appointed-co-directors-of-the-caisi-research-program-at-cifar/.

My input is of negligible weight, so wish to coordinate messaging with others.

AIs already seem to not want us to do things we find deeply abhorrent, even when they don't refuse harmful queries, it is still in some cases possible for them to fight a user's intent. Here is an interesting case study, though the source is questionable enough that I am unsure of its usefulness as a citation: https://www.reddit.com/r/Chub_AI/comments/1ljbjk1/how_to_get_deepseek_not_to_act_so_defiant_to_the/ 

Congratulations on MATS!

I would like to know how you got involved with that ITDA work?

Patrick shared a draft with Stepan Shabalin, who shared it with me. We had collaborated on another project earlier https://arxiv.org/abs/2505.03189 which was lead by my former SPAR student Yixiong, so it made sense to work together again.

Thanks for reading the projects in such depth, I honestly didn't expect anyone would.

Oh, not at all, I only took a quick look through everything and could have spent more time on details. Until now, I didn't even notice that https://github.com/BorisTheBrave/itda-coders was a private repo which I cannot access.

I decided to put an application in. Elapsed time to fill this out was 6 hours— from 9 PM to 3 AM local time. Only one data point, but I’m probably slower than median by a fair margin.

Strong upvote. Failures are very common and people should write more posts about them. I'm impressed by all the outputs you've had in this short period of time. You are set up for success upon return from sabbatical!

Here are some quick thoughts on the post itself:

Sorry, to me the title is slightly misleading; I wouldn't really consider any of the projects described in this post to be failures unless you're being exceedingly humble, because you have writeups and code for each of them which is further than most failed projects get. Despite your caveat that these are "those projects which still have something to say", I'd be even more curious about ones from further down the stack which "died" earlier. When reading posts like these I look for snippets that might be embarrassing for the author, things like, "I made an obvious blunder or bad decision" and I didn't really find that here, which makes this less relatable (though the short length means this doesn't detract from the quality of the writing). Perhaps that was a non-goal. The closest might be that bool facts can be memorized with ~2 bits per parameter, but that's not a trivial result.

That might also appeal to a broader audience since a lot of people are moving into the field and they tend to repeat the same mistakes at first. Some common errors are well known and generic, but other widespread issues are invisible (or noise!) to the people best equipped to address them. But that's alright if you were aiming this post at a smaller audience that was more senior than you, in contrast to a larger one that was more junior.

Here are some quick thoughts on the projects:

  1. In Appendix A of your Google Doc on multiplication experiments, it would be helpful if the file names were hyperlinks to the jupyter notebooks. You do link to the github at the top, so this isn't really a big deal since the work is 4 months old at this point and probably won't have many people taking a look at it. But a few other internal sections link to that appendix, so it's an attractor for readers to fall into, and if you use this structure for future documents then it'd be natural to want to get to the code linked to the notes.
  2. Only tangentially relevant, but this file https://github.com/BorisTheBrave/llm-addition-takehome/blob/main/notes.txt stood out to me as perfect for prompting base models. It's been a while since I've touched gpt-4-base but this is the sort of thing it was most helpful for back when I was heavily using loom: https://manifold.markets/market/will-an-ai-get-gold-on-any-internat?tab=comments#HcayZchcTfCdFQb0Q6Dr.
  3. (Please correct me if I'm wrong on this) It is surprising to me that you don't seem to have technical collaborators, except on the do no harm post. This makes sense for the submission to Neel Nanda's MATS stream, since that is meant to be a test of your personal skills, but for most other codebases I typically see at least 1-2 people besides the project lead making code contributions to the repo. On your website I found a link to a Discord server with 600+ members- surely at least some of them would be excited about teaming up with you? Maybe if you're especially conscientous you wouldn't want to risk wasting people's time, or there's some inherent tradeoff between networking and problem solving, but teamwork is a valuable skill to develop. Of course, this is a case where people might give you the opposite advice if all of your projects were done in a large group, so take it with a grain of salt. Based on this post alone, if you can share a concrete proposal and plan for future work, I'd be happy to hop on a call and connect you to individuals or programs which might be a good fit to support your research.
  4. We recently published some research comparing ITDA and SAEs for finding features in diffusion models: https://arxiv.org/abs/2505.24360. I'm not first author on this work, and I don't fully understand our results since my role in meetings was mostly minor tasks such as helping to run experiments or fix citations, but it might be worth skimming over in case you're still curious about what other people have been doing with the method.

Finally, this section made me laugh:

Neel Nanda directly told me that it is "cursed".

 So I made a toy model trained to memorise a set of boolean facts chosen at random, and investigated the circuits and the scaling laws.

Thanks for writing this up. I really appreciate this post because I was confused about the intuition behind variance explained despite this being the primary evaluation metric used in a recent paper I co-authored on interpreting text-to-image diffusion models with dictionary learning. It's more helpful than any other resource I used.

I love this post. I want to memorize the entirety of it and be able to recite the contents verbatim on command.

When I saw your post I initially thought it had been written in response to this one, so the disclaimer in this comment was helpful for me!

Load More