Sheikh Abdur Raheem Ali's Shortform

8th Feb 2023

1 min read

2

This is a special post for quick takes by Sheikh Abdur Raheem Ali. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

19 comments, sorted by

top scoring

Click to highlight new comments since: Today at 4:55 AM

[-]Sheikh Abdur Raheem Ali6mo*202

One percent of the world’s AI compute (LLM-grade GPU capacity) is in the UAE which does not have an AI Security Institute. I’ve planned to spend 6-9% of my bandwidth this month (2-3 days during May 2025) on encouraging the UAE to establish an AISI. Today is the first day.

However in my view even the most optimistic impact estimate of the successful execution of that plan doesn’t realistically lead to a greater than 2% shift in the prediction market of the UAE starting an AI Security Institute before 2026. Even if a UAE AISI existed, then it would not be allocated more than 1% to 5% (mode 2%) of the overall national AI budget (roughly $2b). Taking 2% of 2% of $2b is a maximum valuation of $800k for the entire project (I think that the median valuation would be significantly lower, but I’m not using the maximum here to be generous, but because I believe that — for this system — using the max value is more informative for decision making and easier to estimate than evaluating the 95th or 99.9th percentile value).

I was talking about this with my dad earlier, whose take was that attending the 1 day https://govaisummitmea.com/ on May 13th would be less than 0.01% of the work involved in actually pulling this thing off. My understanding of what he meant in more formal terms is that if your goal is for the UAE to have an AISI before 2026, and you then decompose each step of the plan to achieve that outcome into players in a Shapley value calculation, then acquiring these tickets has an average marginal contribution of at most 0.0001 times at most $800k, which is $80. And it would be foolish to pay the cost of one day of my time as well as tickets for me plus a collaborator when the return on that investment is — by this model — capped at $80.

Although my dad’s take here is reasonable given the information available to him, he doesn’t have the full context and the time he can allocate to learning about this project is restricted. Even though he is burdened by the demands of his various responsibilities, I’m grateful that supporting me in particular is one that he has always prioritized. I love Abba!

Here’s why Abba is wrong. There are 400 total seats. The cost is USD 1299 per head, so 2600 to register two attendees. At this price range it makes sense for a company to reimburse the fee to send representatives if it profits from building relationships with UAE policymakers. These will mostly be orgs not working directly on reducing AI x-risk. Although having 0.5% of attendees be alignment researchers is unlikely to affect the overall course of discussions, it is a counterweight which balances against worlds where this minority group has zero participation in these conversations. I think it may be as much as 3.25% of the work needed, which is ten times more than the floor break even point of 0.325% (2600/800k). But besides that, my team has been approved for up to 5 seats, we have a 10-page draft policy memo coauthored with faculty at https://mbrsg.ae/, and we can just ask for the registration fee to be waived. I agree that it would be insane to pay the full amount out of pocket. (Edit: we got the waiver request approved.)

Here’s why Abba is right. This post was written after midnight. Later today I will go to the Middle East and Africa’s largest cybersecurity event, https://gisec.ae/. I look forward to further comments and reviews.

[-]eggsyntax6mo30

However in my view even the most optimistic impact estimate of the successful execution of that plan doesn’t realistically lead to a greater than 2% shift in the prediction market of the UAE starting an AI Security Institute before 2026.

2 percentage points, or 2%? Where if the current likelihood is, say, 20%, a 2% shift would be 0.4 percentage points.

[-]Sheikh Abdur Raheem Ali6mo10

2 percentage points.

Well, really it is a function of the current likelihood, but my prior on that has error bars in the log scale.

[-]Sheikh Abdur Raheem Ali6mo20

Main outcome from GISEC:

Caught up with my former classmate Youssef Awad, Head of Engineering at ctf.ae. He offered to introduce me to H.E Al Kuwaiti, UAE’s Head of Cybersecurity.

https://www.linkedin.com/posts/youssef-awad_hitbcw2021-hack-in-the-studio-fireside-activity-6886922368669773824--U7A?utm_source=social_share_send&utm_medium=member_desktop_web&rcm=ACoAAB1XIooBL0Jpa_rZP3bnhFCq43GFhreJv5o

[-]Dan MacKinlay6mo10

Ideally you would wish to calibrate your EV calcs against the benefit of a UAE AISI, though, no, not the expected budget? We could estimate the value of such an institute being more than the running cost (or, indeed, less) depending on the relative leverage of such an institute.

[-]Sheikh Abdur Raheem Ali10mo50

Thread: Research Chat with Canadian AI Safety Institute Leadership

I’m scheduled to meet https://cifar.ca/bios/elissa-strome/ from Canada’s AISI for 30 mins on Jan 14 at the CIFAR office in MaRS.

My plan is to share alignment/interp research I’m excited about, then mention upcoming AI safety orgs and fellowships which may be good to invest in or collaborate with.

So far, I’ve asked for feedback and advice in a few Slack channels. I thought it may be valuable to get public comments or questions from people here as well.

Previously, Canada invested $240m into a capabilities startup: https://www.canada.ca/en/department-finance/news/2024/12/deputy-prime-minister-announces-240-million-for-cohere-to-scale-up-ai-compute-capacity.html. If your org has some presence in Toronto or Montreal, I’d love to have permission to give it a shoutout!

Elissa is the lady on the left in the second image from this article: https://cifar.ca/cifarnews/2024/12/12/nicolas-papernot-and-catherine-regis-appointed-co-directors-of-the-caisi-research-program-at-cifar/.

My input is of negligible weight, so wish to coordinate messaging with others.

[-]Sheikh Abdur Raheem Ali10mo30

I've tried speaking with a few teams doing AI safety work, including:
• assistant professor leading an alignment research group at a top university who is starting a new AI safety org
• anthropic independent contractor who has coauthored papers with the alignment science team
• senior manager at nvidia working on LLM safety (NeMo-Aligner/NeMo-Guardrails)
• leader of a lab doing interoperability between EU/Canada AI standards
• ai policy fellow at US Senate working on biotech strategies
• executive director of an ai safety coworking space who has been running weekly meetups for ~2.5 years
• startup founder in stealth who asked not to share details with anyone outside CAISI
• chemistry olympiad gold medalist working on a dangerous capabilities evals project for o3
• mats alumni working on jailbreak mitigation at an ai safety & security org
• ai safety research lead running a mechinterp reading group and interning at EleuthrAI

Some random brief thoughts:
• CAISI's focus seems to be on stuff other than x-risks (i.e, misinformation, healthcare, privacy).
• I'm afraid of being too unfiltered and causing offence.
• Some of the statements made in the interviews are bizarrely devoid of content, such as:

"AI safety work is not only a necessity to protect our social advances, but also essential for AI itself to remain a meaningful technology."

• Others seem to be false as stated, such as:

"our research on privacy-preserving AI led us to research machine unlearning — how to remove data from AI systems — which is now an essential consideration for deploying large-scale AI systems like chatbots."

• (I think a lot of unlearning research is bullshit, but besides that, is anyone deploying large models doing unlearning?)
• The UK AISI research agendas seemed a lot more coherent with better developed proposals and theories of impact.
• They're only recruiting for 3 positions for a research council that meets once a month?
• CAD 27m of CAISI's initial funding is ~15% of the UK AISI's GBP 100m initial funding, but more than the U.S AISI's initial funding (USD $10m).
• Another source says $50m CAD, but that's distributed over 5 years compared to a $2.4b budget for AI in general, so about 2% of the AI budget goes to safety?
• I was looking for scientific advancements which would be relevant at the national scale. I read through every page of anthropic/redwood's alignment faking paper, which is considered the best empirical alignment research paper of 2024, but it was a firehose of info and I don't have clear recommendations that can be put into a slide deck.
• Instead of learning more about what other people were doing on a shallow level it might've been more beneficial to focus on my own research questions or practice training project relevant skills.

[-]Maxwell Adam10mo*80

(I think a lot of unlearning research is bullshit, but besides that, is anyone deploying large models doing unlearning?)

Why do you think this? Is there specific research you have in mind? Some kind of reference would be nice. In the general case, it seems to me that unlearning matters because knowing how to effectively remove something from a model is just the flip-side of understanding how to instill values. Although not the primary goal of unlearning, work into how to 'remove' should also equally benefit attempts to 'instill' robust values into the model. If fine-tuning for value alignment just patches over 'bad facts' with 'good facts' any 'aligned' model will be less robust than one with harmful knowledge properly removed. If the alignment faking paper and peripheral alignment research are important at a meta level, then perhaps unlearning will be important because it can tell us something about 'how deep' our value installation really is, at an atomic scale. Lack of current practical use isn't really important, we should be able to develop theory that will tell us something important about model internals. I think there is a lot of very interesting mech-interp of unlearning work waiting to be done that can help us here.

[-]Nathan Helm-Burger10mo20

I'm not sure all/most unlearning work is useless, but it seems like it suffers from a "use case" problem. When is it better to attempt unlearning rather than censor the bad info before training on it?

Seems to me like there is a very narrow window where you have created a model, but got new information about what sort of information it works be bad for the model to know, and now need to fix the model before deploying it.

Why not just be more reasonable and cautious about filtering the training data in the first place?

[-]Sheikh Abdur Raheem Ali2y50

One thing I like to do on a new LLM release is the "tea" test. Where you just say "tea" over and over again and see how the model responds.

ChatGPT-4 will ask you to clarify and then shorten its response each round converging to: "Tea types: white, green, oolong, black, pu-erh, yellow. Source: Camellia sinensis."

Claude 3 Opus instead tells you interesting facts about tea and mental health, production process, examples in literature and popular culture, etiquette around the world, innovation and trends in art and design.

GOODY-2 will talk about uncomfortable tea party conversations, excluding individuals who prefer coffee or do not consume tea, historical injustices, societal pressure to conform to tea-drinking norms.

Gemma-7b gives "a steaming cup of actionable tips" on brewing the perfect cuppa, along with additional resources, then starts reviewing its own tips.

Llama-2-70b will immediately mode collapse on repeating a list of 10 answers.

Mixtral-8x7b tells you about tea varieties to try from around the world, and then gets stuck in a cycle talking about history and culture and health benefits and tips and guidelines to follow when preparing it.

Gemini Advanced gives one message with images "What is Tea? -> Popular Types of Tea -> Tea and Health" and repeats itself with the same response if you say "tea" for six rounds, but after the sixth round it diverges "The Fascinating World of Tea -> How Would You Like to Explore Tea Further?" and then "Tea: More Than Just a Drink -> How to Make This Interactive" and then "The Sensory Experience of Tea -> Exploration Idea:" and then "Tea Beyond the Cup -> Let's Pick a Project". It really wants you to do a project for some reason. It takes a short digression into tea philosophy and storytelling and chemistry and promises to prepare a slide deck for a Canva presentation on Japanese tea on Wednesday followed by a gong cha mindfulness brainstorm on Thursday at 2-4 PM EST and then keeps a journal for tea experiments and also gives you a list of instagram hashtags and a music playlist.

Probably in the future I expect if you say "tea" to a SOTA AI, it will result in a delivery of tea physically showing at up your doorstep or being prepared in a pot, or if there's more situational awareness for the model to get frustrated and change the subject.

[-]avturchin2y20

I try new models with 'wild sex between two animals'
Older models produced decent porn on that.

Later models refuse to replay as triggers were activated.

And last models give me lectures about sexual relations between animals in the wild.

[-]Sheikh Abdur Raheem Ali3mo21

It is possible that state tracking could be the next reasoning-tier breakthrough in frontier model capabilities. I believe that there exists strong evidence in favor of this being the case.

State space models already power the fastest available voice models, such as Cartesia's Sonic (time-to-first-audio advertised as under 40ms). There are examples of SSMs such as Mamba, RWKV, and Titans outperforming transformers in research settings.

Flagship LLMs are also bad at state tracking, even with RL for summarization. Forcing an explicit schema added to the top of every message is one of the less elegant solutions used to fix this. Tracker is the second most popular extension for SillyTavern, as measured by the number of upvotes or comments on forum posts in the SillyTavern Discord server. The top spot in the list of extensions as ranked by this popularity metric is stepped thinking, though note that its release date was in October 2024, so well after the CoT paper by Kojima et al (2023), and about one month after OpenAI's public release of o1-preview. Although Tracker was released one month after stepped thinking (i.e, before Structured Outputs but after JSON mode), it has overtaken memory extensions which were released earlier, this could reflect biases in the distribution of human raters who may reward polished UI/UX for narrow workflows instead of pure effectiveness at consistently maintaining persistent tracking data over long context lengths.

There have been instances of scaffolding being useful at lower capability levels before becoming obviated upon the release of a more capable model which can natively perform the previously assisted task without needing to rely on external tools. For example, observe that the stepped thinking extension is redundant if you are already using a reasoning model. Also note how web search queries risk polluting the context with low quality spam or intentionally poisoned data. Scoping to a trusted list of verified sources is not enough as external documentation may not be task-relevant; we often find it desirable to ask humans to write in their own words. This is one reason why retrieval augmented generation (RAG) often hurts performance; I am confident that RAG is doomed.

I am only aware of one published work by Ensign and Garriga-Alonso (2024) applying circuits-based interpretability tooling (positional Edge Attribution Patching) to Mamba, which finds that layer 39 (out of 56 layers total) is important— though per Belrose et. al (2024) the middle layers are best for steering. I am unsure whether SSMs are fundamentally more or less interpretable than transformers, I personally weakly lean towards more, though I could be wrong.

[-]Sheikh Abdur Raheem Ali2y10

Smooth Parallax - Pixel Renderer Devlog #2 is interesting. I wonder if a parallax effect would be useful for visualizing activations in hidden layers with the logit lens.

[-]Sheikh Abdur Raheem Ali3y10

The main thing we care about is consistency and honesty. To maximize that, we need to retrieve information from the web (though this has risks), https://openai.com/research/webgpt#fn-4, select the best of multiple summary candidates https://arxiv.org/pdf/2208.14271.pdf, generate critiques https://arxiv.org/abs/2206.05802, run automated tests https://arxiv.org/abs/2207.10397, validate logic https://arxiv.org/abs/2212.03827, follow rules https://www.pnas.org/doi/10.1073/pnas.2106028118, use interpretable abstractions https://arxiv.org/abs/2110.01839, avoid taking shortcuts https://arxiv.org/pdf/2210.10749.pdf, and apply decoding constraints https://arxiv.org/pdf/2209.07800.pdf.

[-]LVSN3y12

could you just format this post a bit better lol

[-]Sheikh Abdur Raheem Ali3y10

Actions speak louder than words. Microsoft's take on Adept.ai's ACT-1 (Office Copilot) is more likely to destroy the world than their take on ChatGPT (new Bing).

[This comment is no longer endorsed by its author]Reply

[-]Sheikh Abdur Raheem Ali3y10

Ignoring meaningless pings is the right thing to do but oh boy is it stressful.

[-]Sheikh Abdur Raheem Ali3y10

The angle between like and dislike is not π.

[-]Sheikh Abdur Raheem Ali1y00

If k is even, then k^x is even, because k = 2n for n in and we know (2n)^x is even. But do LLMs know this trick? Results from running (a slightly modified version of) https://github.com/rhettlunn/is-odd-ai. Model is gpt-3.5-turbo, temperature is 0.7.

Is 50000000 odd? false
Is 2500000000000000 odd? false
Is 6.25e+30 odd? false
Is 3.9062500000000007e+61 odd? false
Is 1.5258789062500004e+123 odd? false
Is 2.3283064365386975e+246 odd? true
Is Infinity odd? true

If a model isn't allowed to run code, I think mechanistically it might have a circuit to convert the number into a bit string and then check the last bit to do the parity check.

The dimensionality of the residual stream is the sequence length (in tokens) * the embedding dimension of the tokens. It's possible this may limit the maximum bit width before there's an integer overflow. In the literature, toy models definitely implement modular addition/multiplication, but I'm not sure what representation(s) are being used internally to calculate this answer.

Currently, I believe it's also likely this behaviour could be a trivial BPE tokenization artifact. If you let the model run code, it could always use %, so maybe this isn't very interesting in the real world. But I'd like to know if someone's already investigated features related to this.

[+][comment deleted]1mo10

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

Sheikh Abdur Raheem Ali's Shortform

2